Quantifiers-and-Negations-in-RE-Documents

This project was part of my work for a seminar at the Technical University of Munich (TUM) during my bachelor studies in 2019. The python project can be used to find quantifiers and negations in documents. It searches for problematic findings. Problematic findings are i.e. sentences that use specific combinations of quantifiers and negations that are ambiguous. This means there are multiple valid interpretations of the sentence. It can extract those and report them.

Motivation:

You want to avoid ambiguous sentences as they can cause problems that are hard to find and possibly hard to fix. This is especially the case for technical specifications and similar use cases. In this project we compare two different approaches to finding ambiguous sentences:

String based search
NLP based search

We want to find out if the computational overhead of using NLP gives better results than standard string based search methods.

Features:

Detect quantifiers and negations in .xml or .txt documents
Search either by a string based search or by NLP based search (using Stanfords CoreNLP library [1])
Extract possibly ambiguous sentences
Compare string search results with NLP search results

Prerequisites:

Java 8 or higher
Python 3.6 or higher as project interpreter
Stanford Corenlp library: https://stanfordnlp.github.io/CoreNLP/download.html
Environment variable "CORENLP_HOME" set to where the CoreNLP library is stored

References:

[1] Christopher D.Manning, MihaiSurdeanu, JohnBauer, JennyFinkel, StevenJ.Bethard, and David McClosky. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL) System Demonstrations, pages 55–60, 2014.

Quantifiers and Negations in RE Documents

Related tags

Overview

Quantifiers-and-Negations-in-RE-Documents

Owner

Nicolas Ruscher

Awesome Treasure of Transformers Models Collection

Unsupervised Language Model Pre-training for French

AEC_DeepModel - Deep learning based acoustic echo cancellation baseline code

Use the state-of-the-art m2m100 to translate large data on CPU/GPU/TPU. Super Easy!

Hostapd-mac-tod-acl - Setup a hostapd AP with MAC ToD ACL

An evaluation toolkit for voice conversion models.

🕹 An esoteric language designed so that the program looks like the transcript of a Pokémon battle

Faster, modernized fork of the language identification tool langid.py

End-to-end MLOps pipeline of a BERT model for emotion classification.

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Spam filtering made easy for you

This repository contains the code for "Generating Datasets with Pretrained Language Models".

Contains links to publicly available datasets for modeling health outcomes using speech and language.

A flask application to predict the speech emotion of any .wav file.

STT for TorchScript is a port of Coqui STT based on DeepSpeech to PyTorch.

NeurIPS'21: Probabilistic Margins for Instance Reweighting in Adversarial Training (Pytorch implementation).

Yomichad - a Japanese pop-up dictionary that can display readings and English definitions of Japanese words

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields