A transformer-based method for Healthcare Image Captioning in Vietnamese

Last update: May 05, 2022

Overview

vieCap4H Challenge 2021: A transformer-based method for Healthcare Image Captioning in Vietnamese

This repo GitHub contains our solution for vieCap4H Challenge 2021. In detail, we use grid features as visual presentation and pre-training a BERT-based language model from PhoBERT-based pre-trained model to obtain language presentation. Besides, we indicate a suitable schedule with the self-critical training sequence (SCST) technique to achieve the best results. Through experiments, we achieve an average of BLEU 30.3% on the public-test round and 28.9% on the private-test round, which ranks 3rd and 4th, respectively.

Figure 1. An overview of our solution based on RSTNet

1. Data preparation

The grid features of vieCap4H can be downloaded via links below:

Dataset can be downloaded at https://aihub.vn/competitions/40 Annotations must be converted to COCO format. We have already converted and it is available at:

viecap4h-public-train.json.

2. Training

Pre-training BERT-based model with PhoBERT-based

python train_language.py \
--img_path <images path> \
--features_path <features path> \
--annotation_folder <annotations folder> \
--batch_size 40

Weights of BERT-based model should be appeared in folder saved_language_models

Then, continue to train Transformer model via command below::

python train_transformer.py \
--img_path <images path> \
--features_path <features path> \
--annotation_folder <annotations folder> \
--batch_size 40

Weights of Transformr-based model should be appeared in folder saved_transformer_rstnet_models

Where <images path> is data folder, <features path> is the path of grid features folder, <annotations folder> is the path of folder that contains file viecap4h-public-train.json.

3. Inference

The results can be obtained via command below:

python test_viecap.py

4. Pre-trained model

To implement our results on leaderboard, two pretrained models for BERT-based model and Transformer model can be downloaded via links below:

Updating...

A transformer-based method for Healthcare Image Captioning in Vietnamese

Related tags

Overview

vieCap4H Challenge 2021: A transformer-based method for Healthcare Image Captioning in Vietnamese

1. Data preparation

2. Training

3. Inference

4. Pre-trained model

Owner

Doanh B C

The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

Python Single Object Tracking Evaluation

Seg-Torch for Image Segmentation with Torch

Explainability of the Implications of Supervised and Unsupervised Face Image Quality Estimations Through Activation Map Variation Analyses in Face Recognition Models

a dnn ai project to classify which food people are eating on audio recordings

Official implementation of NeurIPS'2021 paper TransformerFusion

[ICCV 2021 Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

The official PyTorch code implementation of "Personalized Trajectory Prediction via Distribution Discrimination" in ICCV 2021.

Open source hardware and software platform to build a small scale self driving car.

This is the code repository for the paper A hierarchical semantic segmentation framework for computer-vision-based bridge column damage detection

Official code for paper "ISNet: Costless and Implicit Image Segmentation for Deep Classifiers, with Application in COVID-19 Detection"

[CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

Official Implementation of "Transformers Can Do Bayesian Inference"

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Official implementation for the paper: "Multi-label Classification with Partial Annotations using Class-aware Selective Loss"

Face recognize system

View model summaries in PyTorch!

Official code for "Decoupling Zero-Shot Semantic Segmentation"

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.

Pytorch Implementation of PointNet and PointNet++++