Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Last update: Dec 30, 2022

Related tags

Text Data & NLP PLBART

Overview

PLBART

Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021.

Note. A detailed documentation is coming soon.

Pre-training data

PLBART is pre-trained on Java and Python functions and natural language descriptions collected from Github and StackOverflow.

Evaluation tasks

We evaluated PLBART on five tasks.

Code summarization [REF]
Code generation [REF]
Code translation [REF]
Clone detection [REF]
Vulnerability REF [REF]

Notes

We will publish the pretrained PLBART checkpoint soon.
We list all the files in this repository here.

Acknowledgement

PLBART uses Fairseq, codeXglue, and TransCoder and thanks the authors of these works for their contribution.

Citation

@inproceedings{ahmad2020summarization,
    author = {Ahmad, Wasi Uddin and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
    booktitle = {Proceedings of the 2021 Conference of the North {A}merican Chapter of the Association for Computational Linguistics},
    title = {Unified Pre-training for Program Understanding and Generation},
    year = {2021}
}

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Related tags

Overview

PLBART

Pre-training data

Evaluation tasks

Notes

Acknowledgement

Citation

Owner

Wasi Ahmad

NLP Text Classification

SummerTime - Text Summarization Toolkit for Non-experts

Backend for the Autocomplete platform. An AI assisted coding platform.

Blackstone is a spaCy model and library for processing long-form, unstructured legal text

Rhyme with AI

Meta learning algorithms to train cross-lingual NLI (multi-task) models

AI_Assistant - This is a Python based Voice Assistant.

Nateve compiler developed with python.

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

lightweight, fast and robust columnar dataframe for data analytics with online update

A python gui program to generate reddit text to speech videos from the id of any post.

End-to-end text to speech system using gruut and onnx. There are 40 voices available across 8 languages.

SDL: Synthetic Document Layout dataset

The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

Python library to make development of portfolio analysis faster and easier

A Telegram bot to add notes to Flomo.

Fastseq 基于ONNXRUNTIME的文本生成加速框架

VMD Audio/Text control with natural language

SimBERT升级版（SimBERTv2）！

Subtitle Workshop (subshop): tools to download and synchronize subtitles