Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

Last update: Feb 07, 2022

Related tags

Overview

NLP-Summarizer

Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

This project aimed to provide insight and explanations to current limitations on Natural Language Processing models by exploring the Transformer model, the latest state-of-the-art NLP solution, as well as discussing possible use cases for such tools in a domestic and workplace environment. An in-depth explanation of the architecture and the limitations it aims to solve was provided, as well as how it can be used to infer various tasks. Numerous use cases of NLP were also explored and how tools such as this can be extremely useful and have a massive impact on today’s society, both domestically and in the workplace. Three specific Transformer models were implemented using a GUI to evaluate their effectiveness. The final artefact provides a user with an interaction between the models for document summarisation tasks of variable output lengths.

Working Example

Following example created using another student's project introduction, original word count was ~1000.

Initial GUI

After Summarization

Getting Started

All code is ran using Python version 3.8.8
The artefact to be operated in it's entirety requires ~20GB of available space for downloads of the pre-trained models.

!pip install transformers
!pip install spacy==2.0.12
!pip install torch
!pip install tk

Runtime will be displayed as an output in console

Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

Related tags

Overview

NLP-Summarizer

Working Example

Initial GUI

After Summarization

Owner

Samuel Sharkey

A very simple framework for state-of-the-art Natural Language Processing (NLP)

TFPNER: Exploration on the Named Entity Recognition of Token Fused with Part-of-Speech

Analyse japanese ebooks using MeCab to determine the difficulty level for japanese learners

GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form

Predict an emoji that is associated with a text

Python generation script for BitBirds

A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format

Data manipulation and transformation for audio signal processing, powered by PyTorch

StarGAN - Official PyTorch Implementation

Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

An attempt to map the areas with active conflict in Ukraine using open source twitter data.

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"

Refactored version of FastSpeech2

gaiic2021-track3-小布助手对话短文本语义匹配复赛rank3、决赛rank4

edge-SR: Super-Resolution For The Masses

HF's ML for Audio study group

Pytorch implementation of Tacotron

A python gui program to generate reddit text to speech videos from the id of any post.