Dual Adaptive Sampling for Machine Learning Interatomic potential.

Related tags

Machine Learningdas
Overview

DAS

Dual Adaptive Sampling for Machine Learning Interatomic potential.

How to cite

If you use this code in your research, please cite this using: Hongliang Yang, Yifan Zhu, Erting Dong, Yabei Wu, Jiong Yang, and Wenqing Zhang. Dual adaptive sampling and machine learning interatomic potentials for modeling materials with chemical bond hierarchy. Phys. Rev. B 104, 094310 (2021).

Install

Install pymtp

You should first install the python interface for mtp: https://github.com/hlyang1992/pymtp

Install das

You can download the code by

git clone https://github.com/hlyang1992/das
cd das
cp -r <path-to-mlip-2>/untrained_mtps/*.mtp das/utils/untrained_mtps

Then remove the redundant settings from each mtp file. Only the following settings can be retained for each mtp file:

radial_funcs_count = 
alpha_moments_count = 
alpha_index_basic_count = 
alpha_index_basic = 
alpha_index_times_count = 
alpha_index_times = 
alpha_scalar_moments = 
alpha_moment_mapping =

Install das by

cd <path-to-das>
pip install -r requirements.txt
pip install .

Usage

das  config_dir  job_name

Configuration

The configuration directory config_dir must contain the configuration file conf.yaml, which controls all sampling processes. The conf.yaml file should look like the following:

"global_settings":

"machine_settings":

"selector_settings": {} 

"labeler_settings":

"trainer_settings":

"sampler_settings":

"init_conf_setting":

"iter_params_template":

"iter_params":
  • global_settings:
"global_settings":
  # The elements in the system, the order of the elements does not matter, the program automatically numbers the 
  # atomic types according to their atomic number from smallest to largest.
  "unique_elements": [ "Co", "Sb" ]
  # path to VASP Pseudopotential Database, see detail at https://wiki.fysik.dtu.dk/ase/ase/calculators/vasp.html#vasp
  "vasp_pp_path": "path_to_directory" 
  • machine_settings:

All time-consuming computational tasks such as sampling, labeling, and training can be dispatched to designated machines via ssh. Currently only LSF is supported and migration to other job management systems is very easy.

"machine_settings":
  "machine_1":
    # The supported machine types are now: `machine_lsf`, `machine_shell`
    "machine_type": "machine_lsf"
    "host": "ip address"
    "user": "username"
    "password": "password"
    # Exclude these nodes when submitting tasks.
    "bad_nodes": [ ] # #BSUB -R "hname!={{node}}"
    "port": 22
    # number of cores for each task
    "n_cores": 40 # #BSUB -n {{ncores}}
    "n_tasks": 40 # The maximum number of tasks to run simultaneously.
    "q_name": "short" # #BSUB -q {{q_name}}
    "env_source_file": "env.sh" # env.sh is in the config_dir
    "run_dir": "path-to-run-directory-in-target"
    "extra_params":
      "vasp_cmd": "mpiexec.hydra -machinefile $LSB_DJOB_HOSTFILE -np $NP vasp"
      "lmp_cmd": "mpiexec.hydra -machinefile $LSB_DJOB_HOSTFILE -np $NP lmp_mlp"
      "mlip_cmd": "mpiexec.hydra -machinefile $LSB_DJOB_HOSTFILE -np $NP mlp train"
      "python_cmd": "absolute path to python path"
  "machine_2":
    # setting for machchine_2
    "machine_type": "machine_lsf"
    # ...

You should prepare a file to set the environment variables. The program will source this file to set the environment variables after connecting to the machine via ssh. For technical reasons please see: The remote shell environment doesn’t match interactive shells

  • sampler_settings
"scale_1":
  "kind": "scale_box"
  "scale_factors": [0.998, 0.9985, 0.999]
"scale_2":
  "kind": "scale_box"
  "scale_factors": [[0.998, 0.9985, 0.999, 0.997], # a
                    [1.002, 1.003, 1.004, 1.005],  # b
                    [0.997, 0.995, 0.999, 0.996]] # c
"nvt_0": 
  "kind": "lmp_model_sampler"
  "max_number_confs": 5
  "min_number_confs": 0
  "machine": "machine_1"
  "lmp_vars":
    "temp": [ 100, 150 ]
    "steps": [ 10000 ]
    "nevery": [ 20 ]
    "prev_steps": [ 0 ]
 
"npt_0": 
  "kind": "lmp_model_sampler"
  "max_number_confs": 5
  "min_number_confs": 0
  "machine": "machine_2"
  "lmp_vars":
    "temp": [ 100, 150 ]
    "steps": [ 10000 ]
    "nevery": [ 20 ]
    "press": [100, 200] # bar
    "prev_steps": [ 0 ]
  • "labeler_settings"

We use ase to generate input files (INCAR, POTCAR, KPOINTS) for VASP calculation. Please see detail at Ase vasp calculator

"labeler_settings":
  "vasp":
    "kind": "vasp"
    "machine": "ty_label"
    "vasp_parms":
      "xc": "pbe"
      "prec": "A"
      # other setting for vasp calculations
  • "trainer_settings"
"trainer_settings":
  "train_5_model":
    "kind": "mtp_trainer"
    "machine": "ty_train" 
    "model_index": 18 
    "min_dist": 1.39 
    "max_dist": 5.0
    "n_models": 5 
    "train_from_prev_model": true 
  • init_conf_setting:
"init_conf_setting":
  "-1": [ "init_MD.cfg" ]
  "-2": [ "init_1.vasp" ]
  "-3": [ "init_2.vasp" ]
  • iter_params_template:
"iter_params_template":
  "0":
    "init_conf": [ -1 ]
    "sampler": [ ]
    "selector": [ ]
    "labeler": [ ]
    "trainer": [ "train_5_model" ]
  "10":
    "init_conf": [ -2 ]
    "sampler": [ "scale_0", "nvt_0" ]
    "selector": [ ]
    "labeler": [ "vasp" ]
    "trainer": [ "train_5_model" ]
  "20":
    "init_conf": [ -3 ]
    "sampler": [ "npt_0"]
    "selector": [ ]
    "labeler": [ "vasp" ]
    "trainer": [ "train_5_model" ]
  "30":
    "init_conf": [ -2,-3 ]
    "sampler": [ "npt_0"]
    "selector": [ ]
    "labeler": [ "vasp" ]
    "trainer": [ "train_5_model" ]
  • iter_params:
"iter_params":
  [
    [ "0" ],
    # If the last one is LOOP, repeat all the previous ones until convergence.
    ["10", "LOOP"], 
    ["30", "LOOP"],
    ["10", "10"]  
    ["20"],
  ]
This is a Machine Learning model which predicts the presence of Diabetes in Patients

Diabetes Disease Prediction This is a machine Learning mode which tries to determine if a person has a diabetes or not. Data The dataset is in comma s

Edem Gold 4 Mar 16, 2022
Simulation of early COVID-19 using SIR model and variants (SEIR ...).

COVID-19-simulation Simulation of early COVID-19 using SIR model and variants (SEIR ...). Made by the Laboratory of Sustainable Life Assessment (GYRO)

José Paulo Pereira das Dores Savioli 1 Nov 17, 2021
Implementation of linesearch Optimization Algorithms in Python

Nonlinear Optimization Algorithms During my time as Scientific Assistant at the Karlsruhe Institute of Technology (Germany) I implemented various Opti

Paul 3 Dec 06, 2022
Implementations of Machine Learning models, Regularizers, Optimizers and different Cost functions.

Linear Models Implementations of LinearRegression, LassoRegression and RidgeRegression with appropriate Regularizers and Optimizers. Linear Regression

Keivan Ipchi Hagh 1 Nov 22, 2021
Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.

Time series analysis today is an important cornerstone of quantitative science in many disciplines, including natural and life sciences as well as eco

Christoph Mark 129 Dec 24, 2022
Kaggle Competition using 15 numerical predictors to predict a continuous outcome.

Kaggle-Comp.-Data-Mining Kaggle Competition using 15 numerical predictors to predict a continuous outcome as part of a final project for a stats data

moisey alaev 1 Dec 28, 2021
Customers Segmentation with RFM Scores and K-means

Customer Segmentation with RFM Scores and K-means RFM Segmentation table: K-Means Clustering: Business Problem Rule-based customer segmentation machin

5 Aug 10, 2022
A flexible CTF contest platform for coming PKU GeekGame events

Project Guiding Star: the Backend A flexible CTF contest platform for coming PKU GeekGame events Still in early development Highlights Not configurabl

PKU GeekGame 14 Dec 15, 2022
Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort

Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort

2.3k Jan 04, 2023
Toolss - Automatic installer of hacking tools (ONLY FOR TERMUKS!)

Tools Автоматический установщик хакерских утилит (ТОЛЬКО ДЛЯ ТЕРМУКС!) Оригиналь

14 Jan 05, 2023
ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions

ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions, in particular, the posterior distributions of Bayesian models in

Computational Data Science Lab 182 Dec 31, 2022
Climin is a Python package for optimization, heavily biased to machine learning scenarios

climin climin is a Python package for optimization, heavily biased to machine learning scenarios distributed under the BSD 3-clause license. It works

Biomimetic Robotics and Machine Learning at Technische Universität München 177 Sep 02, 2022
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Davis E. King 11.6k Jan 02, 2023
A collection of video resources for machine learning

Machine Learning Videos This is a collection of recorded talks at machine learning conferences, workshops, seminars, summer schools, and miscellaneous

Dustin Tran 1.5k Dec 29, 2022
NumPy-based implementation of a multilayer perceptron (MLP)

My own NumPy-based implementation of a multilayer perceptron (MLP). Several of its components can be tuned and played with, such as layer depth and size, hidden and output layer activation functions,

1 Feb 10, 2022
A scikit-learn based module for multi-label et. al. classification

scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth

802 Jan 01, 2023
Credit Card Fraud Detection, used the credit card fraud dataset from Kaggle

Credit Card Fraud Detection, used the credit card fraud dataset from Kaggle

Sean Zahller 1 Feb 04, 2022
Python implementation of Weng-Lin Bayesian ranking, a better, license-free alternative to TrueSkill

Python implementation of Weng-Lin Bayesian ranking, a better, license-free alternative to TrueSkill This is a port of the amazing openskill.js package

Open Debates Project 156 Dec 14, 2022
A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

Chris Santiago 0 Mar 30, 2022
Used Logistic Regression, Random Forest, and XGBoost to predict the outcome of Search & Destroy games from the Call of Duty World League for the 2018 and 2019 seasons.

Call of Duty World League: Search & Destroy Outcome Predictions Growing up as an avid Call of Duty player, I was always curious about what factors led

Brett Vogelsang 2 Jan 18, 2022