An SE(3)-invariant autoencoder for generating the periodic structure of materials

Related tags

Deep Learningcdvae
Overview

Crystal Diffusion Variational AutoEncoder

This software implementes Crystal Diffusion Variational AutoEncoder (CDVAE), which generates the periodic structure of materials.

It has several main functionalities:

  • Generate novel, stable materials by learning from a dataset containing existing material structures.
  • Generate materials by optimizing a specific property in the latent space, i.e. inverse design.

[Paper] [Datasets]

Table of Contents

Installation

The easiest way to install prerequisites is via conda.

Pre-install step

Install conda-merge:

pip install conda-merge

Check that you can invoke conda-merge by running conda-merge -h.

GPU machines

Run the following command to install the environment:

conda-merge env.common.yml env.gpu.yml > env.yml
conda env create -f env.yml

Activate the conda environment with conda activate cdvae.

Install this package with pip install -e ..

CPU-only machines

conda-merge env.common.yml env.cpu.yml > env.yml
conda env create -f env.yml
conda activate cdvae
pip install -e .

Setting up environment variables

Make a copy of the .env.template file and rename it to .env. Modify the following environment variables in .env.

  • PROJECT_ROOT: path to the folder that contains this repo
  • HYDRA_JOBS: path to a folder to store hydra outputs
  • WABDB: path to a folder to store wabdb outputs

Datasets

All datasets are directly available on data/ with train/valication/test splits. You don't need to download them again. If you use these datasets, please consider to cite the original papers from which we curate these datasets.

Find more about these datasets by going to our Datasets page.

Training CDVAE

Training without a property predictor

To train a CDVAE, run the following command:

python cdvae/run.py data=perov expname=perov

To use other datasets, use data=carbon and data=mp_20 instead. CDVAE uses hydra to configure hyperparameters, and users can modify them with the command line or configure files in conf/ folder.

After training, model checkpoints can be found in $HYDRA_JOBS/singlerun/YYYY-MM-DD/expname.

Training with a property predictor

Users can also additionally train an MLP property predictor on the latent space, which is needed for the property optimization task:

python cdvae/run.py data=perov expname=perov model.predict_property=True

The name of the predicted propery is defined in data.prop, as in conf/data/perov.yaml for Perov-5.

Generating materials

To generate materials, run the following command:

python scripts/evaluate.py --model_path MODEL_PATH --tasks recon gen opt

MODEL_PATH will be the path to the trained model. Users can choose one or several of the 3 tasks:

  • recon: reconstruction, reconstructs all materials in the test data. Outputs can be found in eval_recon.ptl
  • gen: generate new material structures by sampling from the latent space. Outputs can be found in eval_gen.pt.
  • opt: generate new material strucutre by minimizing the trained property in the latent space (requires model.predict_property=True). Outputs can be found in eval_opt.pt.

eval_recon.pt, eval_gen.pt, eval_opt.pt are pytorch pickles files containing multiple tensors that describes the structures of M materials batched together. Each material can have different number of atoms, and we assume there are in total N atoms. num_evals denote the number of Langevin dynamics we perform for each material.

  • frac_coords: fractional coordinates of each atom, shape (num_evals, N, 3)
  • atom_types: atomic number of each atom, shape (num_evals, N)
  • lengths: the lengths of the lattice, shape (num_evals, M, 3)
  • angles: the angles of the lattice, shape (num_evals, M, 3)
  • num_atoms: the number of atoms in each material, shape (num_evals, M)

Evaluating model

To compute evaluation metrics, run the following command:

python scripts/compute_metrics.py --root_path MODEL_PATH --tasks recon gen opt

MODEL_PATH will be the path to the trained model. All evaluation metrics will be saved in eval_metrics.json.

Authors and acknowledgements

The software is primary written by Tian Xie, with signficant contributions from Xiang Fu.

The GNN codebase and many utility functions are adapted from the ocp-models by the Open Catalyst Project. Especially, the GNN implementations of DimeNet++ and GemNet are used.

The main structure of the codebase is built from NN Template.

For the datasets, Perov-5 is curated from Perovksite water-splitting, Carbon-24 is curated from AIRSS data for carbon at 10GPa, MP-20 is curated from Materials Project.

Citation

Please consider citing the following paper if you find our code & data useful.

@article{xie2021crystal,
  title={Crystal Diffusion Variational Autoencoder for Periodic Material Generation},
  author={Xie, Tian and Fu, Xiang and Ganea, Octavian-Eugen and Barzilay, Regina and Jaakkola, Tommi},
  journal={arXiv preprint arXiv:2110.06197},
  year={2021}
}

Contact

Please leave an issue or reach out to Tian Xie (txie AT csail DOT mit DOT edu) if you have any questions.

Owner
Tian Xie
Postdoc at MIT CSAIL. Machine learning algorithms for materials, drugs, and beyond.
Tian Xie
Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

video_lie_detector_using_xgboost a video lie detector using OpenFace and xgboost

2 Jan 11, 2022
Various operations like path tracking, counting, etc by using yolov5

Object-tracing-with-YOLOv5 Various operations like path tracking, counting, etc by using yolov5

Pawan Valluri 5 Nov 28, 2022
Soomvaar is the repo which 🏩 contains different collection of 👨‍💻🚀code in Python and 💫✨Machine 👬🏼 learning algorithms📗📕 that is made during 📃 my practice and learning of ML and Python✨💥

Soomvaar 📌 Introduction Soomvaar is the collection of various codes implement in machine learning and machine learning algorithms with python on coll

Felix-Ayush 42 Dec 30, 2022
Implementation of a Transformer, but completely in Triton

Transformer in Triton (wip) Implementation of a Transformer, but completely in Triton. I'm completely new to lower-level neural net code, so this repo

Phil Wang 152 Dec 22, 2022
This is the source code for generating the ASL-Skeleton3D and ASL-Phono datasets. Check out the README.md for more details.

ASL-Skeleton3D and ASL-Phono Datasets Generator The ASL-Skeleton3D contains a representation based on mapping into the three-dimensional space the coo

Cleison Amorim 5 Nov 20, 2022
[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers Installation pip install -r requirements.txt Dataset Preparation Given the

Yingchen Yu 25 Nov 09, 2022
VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation

VID-Fusion VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation Authors: Ziming Ding , Tiankai Yang, Kunyi Zhan

ZJU FAST Lab 86 Nov 18, 2022
Deep Learning for Human Part Discovery in Images - Chainer implementation

Deep Learning for Human Part Discovery in Images - Chainer implementation NOTE: This is not official implementation. Original paper is Deep Learning f

Shintaro Shiba 63 Sep 25, 2022
MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

187 Dec 26, 2022
Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021

Image Translation with ASAPNets Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021 Webpage | Paper | Video Installation insta

Tamar Rott Shaham 100 Dec 28, 2022
Code for "Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations"

Infinitely Deep Bayesian Neural Networks with SDEs This library contains JAX and Pytorch implementations of neural ODEs and Bayesian layers for stocha

Winnie Xu 95 Nov 26, 2021
Algorithmic encoding of protected characteristics and its implications on disparities across subgroups

Algorithmic encoding of protected characteristics and its implications on disparities across subgroups This repository contains the code for the paper

Team MIRA - BioMedIA 15 Oct 24, 2022
Distributed Evolutionary Algorithms in Python

DEAP DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data stru

Distributed Evolutionary Algorithms in Python 4.9k Jan 05, 2023
Implementation of our paper "DMT: Dynamic Mutual Training for Semi-Supervised Learning"

DMT: Dynamic Mutual Training for Semi-Supervised Learning This repository contains the code for our paper DMT: Dynamic Mutual Training for Semi-Superv

Zhengyang Feng 120 Dec 30, 2022
Code of 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces Installation After cloning the repo open

37 Dec 03, 2022
My solutions for Stanford University course CS224W: Machine Learning with Graphs Fall 2021 colabs (GNN, GAT, GraphSAGE, GCN)

machine-learning-with-graphs My solutions for Stanford University course CS224W: Machine Learning with Graphs Fall 2021 colabs Course materials can be

Marko Njegomir 7 Dec 14, 2022
A simple configurable bot for sending arXiv article alert by mail

arXiv-newsletter A simple configurable bot for sending arXiv article alert by mail. Prerequisites PyYAML=5.3.1 arxiv=1.4.0 Configuration All config

SXKDZ 21 Nov 09, 2022
TLDR: Twin Learning for Dimensionality Reduction

TLDR (Twin Learning for Dimensionality Reduction) is an unsupervised dimensionality reduction method that combines neighborhood embedding learning with the simplicity and effectiveness of recent self

NAVER 105 Dec 28, 2022
🥈78th place in Riiid Answer Correctness Prediction competition

Riiid Answer Correctness Prediction Introduction This repository is the code that placed 78th in Riiid Answer Correctness Prediction competition. Requ

Jungwoo Park 10 Jul 14, 2022
Chatbot in 200 lines of code using TensorLayer

Seq2Seq Chatbot This is a 200 lines implementation of Twitter/Cornell-Movie Chatbot, please read the following references before you read the code: Pr

TensorLayer Community 820 Dec 17, 2022