Quick insights from Zoom meeting transcripts using Graph + NLP

Last update: Sep 17, 2022

Overview

Transcript Analysis - Graph + NLP

This program extracts insights from Zoom Meeting Transcripts (.vtt) using TigerGraph and NLTK.

In order to run this program, modify the auth.ini file with your proper graph solution credentials and file paths. Then, simply run main.py. A sample transcript has been provided, but feel free to add your own into the \a_raw_transcripts directory!

As of now, this program performs the following tasks:

Convert .vtt into compact version (stored in \b_cmt_transcripts)
NLP analysis of compact transcript (using NLTK)
- Sentiment analysis
- Trigrams (collocations)
- Frequency of words (plotted)
- Meaningful words (shown as wordcloud)
- Number of speakers, names of speakers
- Who spoke the longest, least, average
Graph analysis of compact transcript (using TigerGraph)
- Analyze relationships between speakers
- Asked the most/least questions
- Pair w/ the most back-and-forth
- (TODO): Linking topics in semantic graph
- (TODO): Named-Entity Recognition
Visual output of all determined insights

Usage

A TigerGraph Cloud Portal solution (https://tgcloud.io/) will be required to run this program.

Kindly find the GraphStudio link here: https://transcript-analysis.i.tgcloud.io/

The schema utilized in this graph is fleshed out below:

Vertex: speaker

(PRIMARY ID) name - STRING

Edge: asked_question

text - STRING

Edge: answered_question

Here is an example of the graph populated with the sample transcript provided:

Analysis

Here is a screenshot of the command-line output produced:

Here is a frequency chart of meaningful words generated:

Here is a word cloud that visualizes common, key terms:

More features coming soon! In the meantime, feel free to continue creating and adding new insights 😁 😁

Quick insights from Zoom meeting transcripts using Graph + NLP

Related tags

Overview

Transcript Analysis - Graph + NLP

Usage

Analysis

References

Owner

Advit Deepak

Searching keywords in PDF file folders

hashily is a Python module that provides a variety of text decoding and encoding operations.

Repository for the paper "Optimal Subarchitecture Extraction for BERT"

COVID-19 Chatbot with Rasa 2.0: open source conversational AI

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

Analyse japanese ebooks using MeCab to determine the difficulty level for japanese learners

A Semi-Intelligent ChatBot filled with statistical and economical data for the Premier League.

Trains an OpenNMT PyTorch model and SentencePiece tokenizer.

ElasticBERT: A pre-trained model with multi-exit transformer architecture.

A retro text-to-speech bot for Discord

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Python library for interactive topic model visualization. Port of the R LDAvis package.

This repository describes our reproducible framework for assessing self-supervised representation learning from speech

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"

[AAAI 21] Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning

Unlimited Call - Text Bombing Tool

Open source annotation tool for machine learning practitioners.

Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products

Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models