This repository contains the trained models and datasets for the paper "Machine Learning Classification of Trajectories from Molecular Dynamics Simulations of Chromosome Segregation" (submitted but not yet accepted). The results listed in the paper can be reproduced with the data and trained models stored here. The code to be used for this can be found in the github repository https://github.com/DavidGeisel/ML_Classification_MD_Trajectories. To reproduce the results, the Github repository should be cloned into a local folder as /home/.../git_repo/ Thereafter, the data stored in this repo should be copied into /home/.../git_repo/ML-models/ to end up with the following directory structure: /home/.../git_repo/evaluation/ (from cloning the git repo) /home/.../git_repo/ML-models/ (data download from this repo) /home/.../git_repo/ML-models/svm (data for support vector machine classifier) /home/.../git_repo/ML-models/random-forest (data for random forest classifier) /home/.../git_repo/ML-models/logistic-regression (data for logistic regression classifier) /home/.../git_repo/ML-models/gradient-boosting (data for gradient boosting classifier) With this, the functions and predefined paths in the jupyter-notebook should be working. The data types stored in this repo consist of the following files: - .npy (= numpy arrays, can be loaded with numpy.load() in python) - .joblib (= trained ML models, can be loaded with joblib.load() in python) - .sav (= trained models, can be loaded with pickle.load() in python) The data in this repo is structured as follows: 1.) The data for the four different ML models is to be found in the respective folders ML-models/svm/ ML-models/random-forest/ ML-models/logistic-regression/ ML-models/gradient-boosting/ 2.) Each of this folders is divided into ML-models//complete-trajectories (= classification of complete trajectories with either high- or low-dimensional input vectors) ML-models//short-trajectories/ (= classification of trajectories of reduced length or varied temporal resolutions) 3.) for every trained model the corresponding train and test data with labels is provided - X_train....npy - X_test...npy - y_train_binary...npy - y_test_binary...npy