RGN2-Replica (WIP)

To eventually become an unofficial working Pytorch implementation of RGN2, an state of the art model for MSA-less Protein Folding for particular use when no evolutionary homologs are available (ie. for protein design).

Install

$ pip install rgn2-replica

To load sample dataset

from datasets import load_from_disk
ds = load_from_disk("data/ur90_small")
print(ds['train'][0])

To convert to pandas for exploration

df = ds['train'].to_pandas()
df.sample(5)

To train ProteinLM

Run the following command with default parameters

python -m scripts.lmtrainer

This will start the run using sample dataset in repo directory on CPU.

TO-DO LIST: ordered by priority

Contribute:

Hey there! New ideas are welcome: open/close issues, fork the repo and share your code with a Pull Request.

Currently the main discussions / conversation about the model development is happening in this discord server under the /self-supervised-learning channel.

Clone this project to your computer:

git clone https://github.com/EricAlcaide/pysimplechain

Please, follow this guideline on open source contribtuion

Citations:

@article {Chowdhury2021.08.02.454840,
    author = {Chowdhury, Ratul and Bouatta, Nazim and Biswas, Surojit and Rochereau, Charlotte and Church, George M. and Sorger, Peter K. and AlQuraishi, Mohammed},
    title = {Single-sequence protein structure prediction using language models from deep learning},
    elocation-id = {2021.08.02.454840},
    year = {2021},
    doi = {10.1101/2021.08.02.454840},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2021/08/04/2021.08.02.454840},
    eprint = {https://www.biorxiv.org/content/early/2021/08/04/2021.08.02.454840.full.pdf},
    journal = {bioRxiv}
}

@article{alquraishi_2019,
	author={AlQuraishi, Mohammed},
	title={End-to-End Differentiable Learning of Protein Structure},
	volume={8},
	DOI={10.1016/j.cels.2019.03.006},
	URL={https://www.cell.com/cell-systems/fulltext/S2405-4712(19)30076-6}
	number={4},
	journal={Cell Systems},
	year={2019},
	pages={292-301.e3}

Replication attempt for the Protein Folding Model

Related tags

Overview

RGN2-Replica (WIP)

Install

To load sample dataset

To train ProteinLM

TO-DO LIST: ordered by priority

Contribute:

Citations:

Owner

Eric Alcaide

Official repository for "Exploiting Session Information in BERT-based Session-aware Sequential Recommendation", SIGIR 2022 short.

A simple pytorch pipeline for semantic segmentation.

Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

Applying CLIP to Point Cloud Recognition.

Semantic Segmentation for Aerial Imagery using Convolutional Neural Network

Setup freqtrade/freqUI on Heroku

Code to produce syntactic representations that can be used to study syntax processing in the human brain

AQP is a modular pipeline built to enable the comparison and testing of different quality metric configurations.

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)

Code to reproduce the results for Compositional Attention

[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime

IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID,

Multi-resolution SeqMatch based long-term Place Recognition

A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks)

Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation.

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

This is a repository for a semantic segmentation inference API using the OpenVINO toolkit

Tutorial page of the Climate Hack, the greatest hackathon ever

Source code for PairNorm (ICLR 2020)

Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs, ICCV 2021