Natural Language Processing for Adverse Drug Reaction (ADR) Detection

This repo contains code from a project to identify ADRs in discharge summaries at Austin Health. The model uses the HuggingFace Transformers library, beginning with the pretrained DeBERTa model. Further MLM pre-training is performed on a large corpus of unannotated discharge summaries. Finally, fine-tuning is peformed on a corpus of annotated discharge summaries (annotated using Prodigy). The model performs NER, but final performance is measured at the document level using the maximum token-level score.

We used Weights and Biases for experiment tracking.

The pretrain script takes a folder containing discharge summaries stored in CSV folders, tokenizes and continues MLM training on deberta-base.

Fine-tuning can then be performed with the finetune script using CLI commands. This script assumes the data is either a JSONL file of annotated text exported from Prodigy (--datafile example.jsonl), or a saved HuggingFace Datasets. If you run this script once on a JSONL file of annotations, you can choose to save the Dataset into a folder (--save_data_dir "save_to_here") and use this for subsequent training runs (--datafile "save_to_here").

Example usage:

python .\finetune.py --folds 5 --epochs 15 --lr 5e-5 --wandb_on --hub_off --project 'CLI Tests' --run_name cross-validation --datafile 'data'

Note: you might find that your exported annotations (JSONL file) is not encoded using UTF-8, which will prevent this code from working. There are various methods to change the encoding and these can all be found with a quick Google search. On a windows machine, for example, modify the following in powershell:

Get-Content .\name_of_file.jsonl -Encoding Unicode | Set-Content -Encoding UTF8 .\name_of_new_file.jsonl

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Related tags

Overview

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Owner

Medicines Optimisation Service - Austin Health

A simple word search made in python

GooAQ 🥑 : Google Answers to Google Questions!

Proquabet - Convert your prose into proquints and then you essentially have Vogon poetry

A library for finding knowledge neurons in pretrained transformer models.

ChatBotProyect - This is an unfinished project about a simple chatbot.

Example code for "Real-World Natural Language Processing"

All the code I wrote for Overwatch-related projects that I still own the rights to.

Just a Basic like Language for Zeno INC

Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

PyTorch source code of NAACL 2019 paper "An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models"

Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention

Code for paper: An Effective, Robust and Fairness-awareHate Speech Detection Framework

An evaluation toolkit for voice conversion models.

Python bot created with Selenium that can guess the daily Wordle word correct 96.8% of the time.

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Beautiful visualizations of how language differs among document types.

Nested Named Entity Recognition for Chinese Biomedical Text

Source code for the paper "TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations"

Perform sentiment analysis on textual data that people generally post on websites like social networks and movie review sites.

Easy to use, state-of-the-art Neural Machine Translation for 100+ languages