This is a NLP based project to extract effective date of the contract from their text files.

Last update: Jan 26, 2022

Overview

Date-Extraction-from-Contracts

This is a NLP based project to extract effective date of the contract from their text files.

Problem statement

This is a NLP based project where effective dates needs to be identified from the contracts as per the given text data of the contracts. The dates could be in any format for eg - 01/01/2022, 1st Jan, 2022, 1st January, 2022, 01 Jan 2022, etc.

Libraries Used

Numpy
Tensorflow
keras
nltk
Sklearn
matplotlib
pandas

Approach

Data prerprocessing

To preprocess the text data the custom function was developed to preprocess the data as the convential libraires out there are not focused on preprocessing dates in a text corpus. To perform the requried tokenization and vectorization of the text nltk was used instaed of tensorflow or keras based text preprocessors. The preprocessing includes data cleaning (remvoing improper data lbaleing or file namings), stopwords removal, puncation removal but keeping in mind the punctutaions within a date like '/', spacing and seperating dates with words as there were cases where the numbers in the dates are conjoined with the preceding word, tokenization and vectorization of word. For vectorization of the word a normal word based vectorization was used as usig TF-IDF would not have made much difference in terms of date extraction.

Preprocessed data before vectorization:

Model Building

The model for this problem was a RNN based model with a bidirectional LSTM layer. the inputs of the model include the preprocessed data with 3 output values each predicting the values of a day, month and year respectively.

The model was trained a decayed learning rate starting from a learning rate of 0.001 and trained for 80 epochs with a batch size of 8.

Model Architecture:

Results

The model performed quite well being a baseline model to extract date using just a single Bidirectional LSTM layer. The prediction file is atatched to refer the results.

This is a NLP based project to extract effective date of the contract from their text files.

Related tags

Overview

Date-Extraction-from-Contracts

Problem statement

Libraries Used

Approach

Data prerprocessing

Model Building

Results

Owner

Sambhav Garg

Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Training RNNs as Fast as CNNs

German Text-To-Speech Engine using Tacotron and Griffin-Lim

Interactive Jupyter Notebook Environment for using the GPT-3 Instruct API

Code for the paper "Language Models are Unsupervised Multitask Learners"

Machine Psychology: Python Generated Art

MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert, MILES uses the bert-base-multilingual-uncased model, as well as simple language-agnostic approaches to complex word identification (CWI) and candidate ranking.

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS)

A Python/Pytorch app for easily synthesising human voices

LeBenchmark: a reproducible framework for assessing SSL from speech

Chinese named entity recognization (bert/roberta/macbert/bert_wwm with Keras)

In this project, we compared Spanish BERT and Multilingual BERT in the Sentiment Analysis task.

Persian-lexicon - A lexicon of 70K unique Persian (Farsi) words

ADCS cert template modification and ACL enumeration

Faster, modernized fork of the language identification tool langid.py

This repository contains the code for running the character-level Sandwich Transformers from our ACL 2020 paper on Improving Transformer Models by Reordering their Sublayers.

What are the best Systems? New Perspectives on NLP Benchmarking

A calibre plugin that generates Word Wise and X-Ray files then sends them to Kindle. Supports KFX, AZW3 and MOBI eBooks. X-Ray supports 18 languages.