This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection', CVPR 2019.

Last update: Aug 19, 2022

Overview

Code-and-Dataset-for-CapSal

This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection', CVPR 2019. Paper link

Our code is implemented based on the Mask RCNN in Tensorflow and Keras. You can first install the maskrcnn according to the instruction or INSTALL.md.

COCO-CapSal Dataset

The COCO-CapSal dataset provides the saliency ground truth as well as the image captions for each image. It contains 5265 images for training and 1459 ones for validation. The annotations can be downloaded at BaiduYun or GoogleDrive. The folder 'capsal' contains the images, ground truth maps as well as the caprions (json file) of both training and validation sets.

Evaluation

For testing the CapSal model, first download the trained model at BaiduYun or Google ) and put it under the ./model. Run test_capsal.py to obtain the saliency maps of different datasets. The saliency map is avaliable at Google or BaiduYun.

Train

Run 'train.py'.

Citation

    @InProceedings{Zhang_2019_CVPR,
            author = {Zhang, Lu and Zhang, Jianming and Lin, Zhe and Lu, Huchuan and He, You},
            title = {CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection},
            booktitle = CVPR,
            year = {2019}}

This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection', CVPR 2019.

Related tags

Overview

Code-and-Dataset-for-CapSal

COCO-CapSal Dataset

Evaluation

Train

Citation

Owner

lu zhang

the official implementation of the paper "Isometric Multi-Shape Matching" (CVPR 2021)

Unimodal Face Classification with Multimodal Training

Built a deep neural network (DNN) that functions as an end-to-end machine translation pipeline

Range Image-based LiDAR Localization for Autonomous Vehicles Using Mesh Maps

Script that attempts to force M1 macs into RGB mode when used with monitors that are defaulting to YPbPr.

Sample Prior Guided Robust Model Learning to Suppress Noisy Labels

Split your patch similarly to `git add -p` but supporting multiple buckets

[ICLR 2022] DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

RSC-Net: 3D Human Pose, Shape and Texture from Low-Resolution Images and Videos

Official implementation of the Implicit Behavioral Cloning (IBC) algorithm

Sudoku solver - A sudoku solver with python

TeST: Temporal-Stable Thresholding for Semi-supervised Learning

This is a official repository of SimViT.

A Python library for Deep Probabilistic Modeling

Download from Onlyfans.com.

PyTorch implementation for "Mining Latent Structures with Contrastive Modality Fusion for Multimedia Recommendation"

An Object Oriented Programming (OOP) interface for Ontology Web language (OWL) ontologies.

A note taker for NVDA. Allows the user to create, edit, view, manage and export notes to different formats.

[CVPR 2021] Teachers Do More Than Teach: Compressing Image-to-Image Models (CAT)

Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch