An SE(3)-invariant autoencoder for generating the periodic structure of materials

Last update: Dec 10, 2022

Related tags

Overview

Crystal Diffusion Variational AutoEncoder

This software implementes Crystal Diffusion Variational AutoEncoder (CDVAE), which generates the periodic structure of materials.

It has several main functionalities:

Generate novel, stable materials by learning from a dataset containing existing material structures.
Generate materials by optimizing a specific property in the latent space, i.e. inverse design.

[Paper] [Datasets]

Installation
Datasets
Training CDVAE
Generating materials
Evaluating model
Authors and acknowledgements
Citation
Contact

Installation

The easiest way to install prerequisites is via conda.

Pre-install step

Install conda-merge:

pip install conda-merge

Check that you can invoke conda-merge by running conda-merge -h.

GPU machines

Run the following command to install the environment:

conda-merge env.common.yml env.gpu.yml > env.yml
conda env create -f env.yml

Activate the conda environment with conda activate cdvae.

Install this package with pip install -e ..

CPU-only machines

conda-merge env.common.yml env.cpu.yml > env.yml
conda env create -f env.yml
conda activate cdvae
pip install -e .

Setting up environment variables

Make a copy of the .env.template file and rename it to .env. Modify the following environment variables in .env.

PROJECT_ROOT: path to the folder that contains this repo
HYDRA_JOBS: path to a folder to store hydra outputs
WABDB: path to a folder to store wabdb outputs

Datasets

All datasets are directly available on data/ with train/valication/test splits. You don't need to download them again. If you use these datasets, please consider to cite the original papers from which we curate these datasets.

Find more about these datasets by going to our Datasets page.

Training CDVAE

Training without a property predictor

To train a CDVAE, run the following command:

python cdvae/run.py data=perov expname=perov

To use other datasets, use data=carbon and data=mp_20 instead. CDVAE uses hydra to configure hyperparameters, and users can modify them with the command line or configure files in conf/ folder.

After training, model checkpoints can be found in $HYDRA_JOBS/singlerun/YYYY-MM-DD/expname.

Training with a property predictor

Users can also additionally train an MLP property predictor on the latent space, which is needed for the property optimization task:

python cdvae/run.py data=perov expname=perov model.predict_property=True

The name of the predicted propery is defined in data.prop, as in conf/data/perov.yaml for Perov-5.

Generating materials

To generate materials, run the following command:

python scripts/evaluate.py --model_path MODEL_PATH --tasks recon gen opt

MODEL_PATH will be the path to the trained model. Users can choose one or several of the 3 tasks:

recon: reconstruction, reconstructs all materials in the test data. Outputs can be found in eval_recon.ptl
gen: generate new material structures by sampling from the latent space. Outputs can be found in eval_gen.pt.
opt: generate new material strucutre by minimizing the trained property in the latent space (requires model.predict_property=True). Outputs can be found in eval_opt.pt.

eval_recon.pt, eval_gen.pt, eval_opt.pt are pytorch pickles files containing multiple tensors that describes the structures of M materials batched together. Each material can have different number of atoms, and we assume there are in total N atoms. num_evals denote the number of Langevin dynamics we perform for each material.

frac_coords: fractional coordinates of each atom, shape (num_evals, N, 3)
atom_types: atomic number of each atom, shape (num_evals, N)
lengths: the lengths of the lattice, shape (num_evals, M, 3)
angles: the angles of the lattice, shape (num_evals, M, 3)
num_atoms: the number of atoms in each material, shape (num_evals, M)

Evaluating model

To compute evaluation metrics, run the following command:

python scripts/compute_metrics.py --root_path MODEL_PATH --tasks recon gen opt

MODEL_PATH will be the path to the trained model. All evaluation metrics will be saved in eval_metrics.json.

Authors and acknowledgements

The software is primary written by Tian Xie, with signficant contributions from Xiang Fu.

The GNN codebase and many utility functions are adapted from the ocp-models by the Open Catalyst Project. Especially, the GNN implementations of DimeNet++ and GemNet are used.

The main structure of the codebase is built from NN Template.

For the datasets, Perov-5 is curated from Perovksite water-splitting, Carbon-24 is curated from AIRSS data for carbon at 10GPa, MP-20 is curated from Materials Project.

Citation

Please consider citing the following paper if you find our code & data useful.

@article{xie2021crystal,
  title={Crystal Diffusion Variational Autoencoder for Periodic Material Generation},
  author={Xie, Tian and Fu, Xiang and Ganea, Octavian-Eugen and Barzilay, Regina and Jaakkola, Tommi},
  journal={arXiv preprint arXiv:2110.06197},
  year={2021}
}

Contact

Please leave an issue or reach out to Tian Xie (txie AT csail DOT mit DOT edu) if you have any questions.

An SE(3)-invariant autoencoder for generating the periodic structure of materials

Related tags

Overview

Crystal Diffusion Variational AutoEncoder

Table of Contents

Installation

Pre-install step

GPU machines

CPU-only machines

Setting up environment variables

Datasets

Training CDVAE

Training without a property predictor

Training with a property predictor

Generating materials

Evaluating model

Authors and acknowledgements

Citation

Contact

Owner

Tian Xie

Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

Various operations like path tracking, counting, etc by using yolov5

Soomvaar is the repo which 🏩 contains different collection of 👨‍💻🚀code in Python and 💫✨Machine 👬🏼 learning algorithms📗📕 that is made during 📃 my practice and learning of ML and Python✨💥

Implementation of a Transformer, but completely in Triton

This is the source code for generating the ASL-Skeleton3D and ASL-Phono datasets. Check out the README.md for more details.

[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation

Deep Learning for Human Part Discovery in Images - Chainer implementation

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021

Code for "Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations"

Algorithmic encoding of protected characteristics and its implications on disparities across subgroups

Distributed Evolutionary Algorithms in Python

Implementation of our paper "DMT: Dynamic Mutual Training for Semi-Supervised Learning"

Code of 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

My solutions for Stanford University course CS224W: Machine Learning with Graphs Fall 2021 colabs (GNN, GAT, GraphSAGE, GCN)

A simple configurable bot for sending arXiv article alert by mail

TLDR: Twin Learning for Dimensionality Reduction

🥈78th place in Riiid Answer Correctness Prediction competition

Chatbot in 200 lines of code using TensorLayer