Image captioning

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Model is seq2seq model. In the encoder pretrained EfficientNet-b3 model is used to extract the features. Decoder is the LSTM with the Bahdanau Attention.

Dataset

The dataset is available at kaggle and contains 8,000 images that are each paired with five different captions.

Usage

run in terminal: python -m img_caption

Config

The user interface consists of file:

config.yaml - general configuration with data and model parameters

Default config.yaml:

data:
  path_to_data_folder: "data"
  caption_file_name: "captions.txt"
  images_folder_name: "Images"
  output_folder_name: "output"
  logging_file_name: "logging.txt"
  model_file_name: "model.pt"

batch_size: 32
num_worker: 1
gensim_model_name: "glove-wiki-gigaword-200"

model:
  embedding_dimension: 200
  decoder_hidden_dimension: 300
  learning_rate: 0.0001
  momentum: 0.9
  n_epochs: 50
  clip: 5
  fine_tune_encoder: false

Output

After training the model, the pipeline will return the following files:

model.pt - checkpoint with:
- epoch - last epoch
- model_state_dict - model parameters
- optimizer_state_dict - the state of the optimizer
- train_history - training history from a model
- valid_history - validation history from a model
- best_valid_loss - the best validation loss

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Related tags

Overview

Image captioning

Dataset

Usage

Config

Output

Owner

fastai ulmfit - Pretraining the Language Model, Fine-Tuning and training a Classifier

Text Classification in Turkish Texts with Bert

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Mysticbbs-rjam - rJAM splitscreen message reader for MysticBBS A46+

A Semi-Intelligent ChatBot filled with statistical and economical data for the Premier League.

🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks

Knowledge Management for Humans using Machine Learning & Tags

An official repository for tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a University of Edinburgh master's course.

Official codebase for Can Wikipedia Help Offline Reinforcement Learning?

Constituency Tree Labeling Tool

LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language

Python library for interactive topic model visualization. Port of the R LDAvis package.

This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

SimCTG - A Contrastive Framework for Neural Text Generation

Code Generation using a large neural network called GPT-J

A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

基于“Seq2Seq+前缀树”的知识图谱问答

A cross platform OCR Library based on PaddleOCR & OnnxRuntime

Black for Python docstrings and reStructuredText (rst).