Improved Fitness Optimization Landscapes for Sequence Design

Last update: Dec 20, 2022

Overview

ReLSO

Improved Fitness Optimization Landscapes for Sequence Design

Description
Citation
How to run
Training models
Original data source

Description

In recent years, deep learning approaches for determining protein sequence-fitness relationships have gained traction. Advances in high-throughput mutagenesis, directed evolution, and next-generation sequencing have allowed for the accumulation of large amounts of labelled fitness data and consequently, attracted the application of various deep learning methods. Although these methods learn an implicit fitness landscape, there is little work on using the latent encoding directly for protein sequence optimization. Here we show that this latent space representation of a fitness landscape can be made very amenable to latent space optimization through a joint-training process. We also show that this encoding strategy which also provides improvements to generalization over more traditional training strategies. We apply our approach to several biological contexts and show that latent space optimization in a smooth learned folding landscape allows for more accurate and efficient optimization of protein sequences.

Citation

This repo accompanies the following publication:

Egbert Castro, Abhinav Godavarthi, Julien Rubinfien, Smita Krishnaswamy. Guided Generative Protein Design using Regularized Transformers. Nature Machine Intelligence, in review (2021).

How to run

First, install dependencies

# clone project   
git clone https://github.com/KrishnaswamyLab/ReLSO-Guided-Generative-Protein-Design-using-Regularized-Transformers.git


# install project   
cd ReLSO-Guided-Generative-Protein-Design-using-Regularized-Transformers 
pip install -e .   
pip install -r requirements.txt

Usage

Training models

# run training script
python train_relso.py  --data gifford

*note: if arg option is not relevant to current model selection, it will not be used. See init method of each model to see what's used.

available dataset args:

    gifford, GB1_WU, GFP, TAPE

available auxnetwork args:

    base_reg

Original data sources

You might also like...

An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

195 Dec 17, 2022

Sequence lineage information extracted from RKI sequence data repo

Pango lineage information for German SARS-CoV-2 sequences This repository contains a join of the metadata and pango lineage tables of all German SARS-

24 Oct 26, 2022

Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

1.4k Jan 8, 2023

Aircraft design optimization made fast through modern automatic differentiation

Aircraft design optimization made fast through modern automatic differentiation. Plug-and-play analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.

394 Dec 23, 2022

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

3.7k Jan 3, 2023

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

NLopt is a library for nonlinear local and global optimization, for functions with and without gradient information. It is designed as a simple, unifi

1.4k Dec 25, 2022

Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Racing Line Optimization with PSO This repository contains a racing line optimization algorithm in python that uses Particle Swarm Optimization. Requi

6 Dec 14, 2022

Puzzle-CAM: Improved localization via matching partial and full features.

Puzzle-CAM The official implementation of "Puzzle-CAM: Improved localization via matching partial and full features".

150 Nov 14, 2022

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Feel free to visit my homepage Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DIMP) [ECCVW2020 paper] Presentation

35 Oct 26, 2022

Comments

Conda env create not working
When I type in the command as instructed in how to run, I get this error:

Warning: you have pip-installed dependencies in your environment file, but you do not list pip itself as one of your conda dependencies. Conda may not use the correct pip to install your packages, and they may end up in the wrong place. Please add an explicit pip dependency. I'm adding one for you, but still nagging you. Collecting package metadata (repodata.json): done Solving environment: failed

ResolvePackageNotFound:

libcxx==12.0.0=h2f01273_0

python==3.10.4=hdfd78df_0

openssl==1.1.1q=hca72f7f_0

ncurses==6.3=hca72f7f_3

readline==8.1.2=hca72f7f_1

bzip2==1.0.8=h1de35cc_0

ca-certificates==2022.07.19=hecd8cb5_0

xz==5.2.5=hca72f7f_1

libffi==3.3=hb1e8313_2

zlib==1.2.12=h4dc903c_2

sqlite==3.38.5=h707629a_0

tk==8.6.12=h5d9f67b_0
opened by Pixelatory 1
May the internal information of gifford data leads to a bias results given by model?

I'm very intersted in your work and analysize the gifford data. Firstly, I use the CD-HIT( a Cluster tool) split into different clusters.Then, I chose the sequence (comes the Clsuter-1(a cluster subset contaiing similar sequences given by CD-HIT)) with highest enrich value as a baseline, and focus on the residue difference between it and others sequences. Very interstingly, i find those sequences that containg 2 or 3 different residues compared to baseline sequence, usually have high enrichments. In Top-100 high enrichments, it can at 65%. As i know， your work is a multitask that both focus on generation and prediction. **I wonder that whether the JT-VAE tends to produce the new sequences that different from the corresponding baseline sequence with highest enrichment about 2 or 3 different residues , and the prediction neural network may think such sequences are good results.**It means that the model only need to realize the fact that compared to high enrich sequnces,the new sequnces contain 2 or 3 different residues is good enough. Beacuse i not find your results, i hope you can give me some advices.

opened by chengyunzhang 0

Releases(v1.0)

v1.0(Jul 31, 2022)

Published version of code and dataset.
Source code(tar.gz)
Source code(zip)

Owner

Krishnaswamy Lab

GitHub Repository

The world's largest toxicity dataset.

The Toxicity Dataset by Surge AI Saving the internet is fun. Combing through thousands of online comments to build a toxicity dataset isn't. That's wh

134 Dec 19, 2022

Least Square Calibration for Peer Reviews

Least Square Calibration for Peer Reviews Requirements gurobipy - for solving convex programs GPy - for Bayesian baseline numpy pandas To generate p

[email protected]"> 1 Nov 01, 2021

This provides the R code and data to replicate results in "The USS Trustee’s risky strategy"

USSBriefs2021 This provides the R code and data to replicate results in "The USS Trustee’s risky strategy" by Neil M Davies, Jackie Grant and Chin Yan

1 Oct 30, 2021

Image Segmentation Evaluation

Image Segmentation Evaluation Martin Keršner, [email protected] Evaluation

273 Oct 28, 2022

StackNet is a computational, scalable and analytical Meta modelling framework

StackNet This repository contains StackNet Meta modelling methodology (and software) which is part of my work as a PhD Student in the computer science

1.3k Dec 15, 2022

Official pytorch implementation of paper Dual-Level Collaborative Transformer for Image Captioning (AAAI 2021).

Dual-Level Collaborative Transformer for Image Captioning This repository contains the reference code for the paper Dual-Level Collaborative Transform

160 Dec 11, 2022

Spatial Transformer Nets in TensorFlow/ TensorLayer

MOVED TO HERE Spatial Transformer Networks Spatial Transformer Networks (STN) is a dynamic mechanism that produces transformations of input images (or

36 Nov 23, 2022

Malware Analysis Neural Network project.

MalanaNeuralNetwork Description Malware Analysis Neural Network project. Table of Contents Getting Started Requirements Installation Clone Set-Up VENV

2 Nov 13, 2021

The official PyTorch code for NeurIPS 2021 ML4AD Paper, "Does Thermal data make the detection systems more reliable?"

MultiModal-Collaborative (MMC) Learning Framework for integrating RGB and Thermal spectral modalities This is the official code for NeurIPS 2021 Machi

12 Nov 02, 2022

A package for music online and offline rhythmic information analysis including music Beat, downbeat, tempo and meter tracking.

BeatNet A package for music online and offline rhythmic information analysis including music Beat, downbeat, tempo and meter tracking. This repository

157 Dec 27, 2022

An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation This is an official implementation of the paper "Exploiting a Joint

35 Oct 26, 2022

Improved Fitness Optimization Landscapes for Sequence Design

Related tags

Overview

ReLSO

Description

Citation

How to run

Usage

Training models

available dataset args:

available auxnetwork args:

Original data sources

You might also like...

An implementation of a sequence to sequence neural network using an encoder-decoder

Sequence lineage information extracted from RKI sequence data repo

Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Aircraft design optimization made fast through modern automatic differentiation

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Puzzle-CAM: Improved localization via matching partial and full features.

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Comments

Conda env create not working

May the internal information of gifford data leads to a bias results given by model?

Releases(v1.0)

v1.0(Jul 31, 2022)

Owner

Krishnaswamy Lab

The world's largest toxicity dataset.

Least Square Calibration for Peer Reviews

This provides the R code and data to replicate results in "The USS Trustee’s risky strategy"

Image Segmentation Evaluation

StackNet is a computational, scalable and analytical Meta modelling framework

Official pytorch implementation of paper Dual-Level Collaborative Transformer for Image Captioning (AAAI 2021).

Spatial Transformer Nets in TensorFlow/ TensorLayer

Malware Analysis Neural Network project.

The official PyTorch code for NeurIPS 2021 ML4AD Paper, "Does Thermal data make the detection systems more reliable?"

A package for music online and offline rhythmic information analysis including music Beat, downbeat, tempo and meter tracking.

PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)

Streaming Anomaly Detection Framework in Python (Outlier Detection for Streaming Data)

使用深度学习框架提取视频硬字幕；docker容器免安装深度学习库，使用本地api接口使得界面和后端识别分离；

Winners of DrivenData's Overhead Geopose Challenge

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

Cooperative multi-agent reinforcement learning for high-dimensional nonequilibrium control

Python package for Bayesian Machine Learning with scikit-learn API

Calibrate your listeners! Robust communication-based training for pragmatic speakers. Findings of EMNLP 2021.

ICNet for Real-Time Semantic Segmentation on High-Resolution Images, ECCV2018

An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.