A number of methods in order to perform Natural Language Processing on live data derived from Twitter

Last update: Nov 24, 2021

Related tags

Overview

Twitter_NLP

Link to Project: https://twitoff-amadou.herokuapp.com/

==Description==

This project integrates a number of methods in order to perform Natural Language Processing (NLP) on live data derived from Twitter. The goal of this project is to demonstrate how NLP can be used at a basic level to classify hypertext by which Twitter user is most likely to 'tweet' (or post) it. For this project, Twitter API access had been granted, and implemented with the Tweepy wrapper for python.

To start, the web app it built using the Flask platform and is deployed on Heroku. For the functionality of the project, data is extracted from Twitter using its API and the Tweepy library and is fed into SQLAlchemy tables. These tables which hold a variety of information we're concerned with, such as the usernames and past tweeting data, are integrated with our PostgreSQL database. The Spacy library is then responsible for vectorizing our tweets into components our models can operate on. Finally, a random forest classifier is tasked with receiving and training on these vectors.

The interface of the app is quite intuitive. There are two text boxes, one labeled "User to add" and the other, "Tweet text to predict". The user is expected to type a name into the 'add' box, such that Tweepy can add the respective twitter user(s) and their tweeting data to our PostgreSQL database. Our random forest will then train live on the inputted values. Once this has been accomplished with at least two Twitter users in the database, one can add text into the 'predict' box, select the two users they wish to compare and let our model produce a result.

A number of methods in order to perform Natural Language Processing on live data derived from Twitter

Related tags

Overview

Twitter_NLP

==Description==

Owner

Rhyme with AI

LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Generating new names based on trends in data using GPT2 (Transformer network)

GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form

Implementation of TTS with combination of Tacotron2 and HiFi-GAN

The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Top2Vec is an algorithm for topic modeling and semantic search.

Asr abc - Automatic speech recognition(ASR),中文语音识别

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

Exploration of BERT-based models on twitter sentiment classifications

Sequence-to-Sequence learning using PyTorch

Segmenter - Transformer for Semantic Segmentation

Creating a python chatbot that Starbucks users can text to place an order + help cut wait time of a normal coffee.

TweebankNLP - Pre-trained Tweet NLP Pipeline (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Models + Tweebank-NER

ChainKnowledgeGraph, 产业链知识图谱包括A股上市公司、行业和产品共3类实体

Retraining OpenAI's GPT-2 on Discord Chats

中文医疗信息处理基准CBLUE: A Chinese Biomedical LanguageUnderstanding Evaluation Benchmark

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

Russian words synonyms and antonyms