FcaNet: Frequency Channel Attention Networks

Last update: Dec 27, 2022

Related tags

Deep Learning FcaNet

Overview

FcaNet: Frequency Channel Attention Networks

PyTorch implementation of the paper "FcaNet: Frequency Channel Attention Networks".

Simplest usage

Models pretrained on ImageNet can be simply accessed by (without any configuration or installation):

model = torch.hub.load('cfzd/FcaNet', 'fca34' ,pretrained=True)
model = torch.hub.load('cfzd/FcaNet', 'fca50' ,pretrained=True)
model = torch.hub.load('cfzd/FcaNet', 'fca101' ,pretrained=True)
model = torch.hub.load('cfzd/FcaNet', 'fca152' ,pretrained=True)

Install

Please see INSTALL.md

Models

Classification models on ImageNet

Due to the conversion between FP16 training and the provided FP32 models, the evaluation results are slightly different(max -0.06%/+0.05%) compared with the reported results.

Model	Reported	Evaluation Results	Link
FcaNet34	75.07	75.02	GoogleDrive/BaiduDrive(code:m7v8)
FcaNet50	78.52	78.57	GoogleDrive/BaiduDrive(code:mgkk)
FcaNet101	79.64	79.63	GoogleDrive/BaiduDrive(code:8t0j)
FcaNet152	80.08	80.02	GoogleDrive/BaiduDrive(code:5yeq)

Detection and instance segmentation models on COCO

Model	Backbone	AP	AP50	AP75	Link
Faster RCNN	FcaNet50	39.0	61.1	42.3	GoogleDrive/BaiduDrive(code:q15c)
Faster RCNN	FcaNet101	41.2	63.3	44.6	GoogleDrive/BaiduDrive(code:pgnx)
Mask RCNN	Fca50 det Fca50 seg	40.3 36.2	62.0 58.6	44.1 38.1	GoogleDrive/BaiduDrive(code:d9rn)

Training

Please see launch_training_classification.sh and launch_training_detection.sh for training on ImageNet and COCO, respectively.

Testing

Please see launch_eval_classification.sh and launch_eval_detection.sh for testing on ImageNet and COCO, respectively.

FAQ

Since the paper is uploaded to arxiv, many academic peers ask us: the proposed DCT basis can be viewed as a simple tensor, then how about learning the tensor directly? Why use DCT instead of learnable tensor? Learnable tensor can be better than DCT.

Our concrete answer is: the proposed DCT is better than the learnable way, although it is counter-intuitive.

Method	ImageNet Top-1 Acc	Link
Learnable tensor, random initialization	77.914	GoogleDrive/BaiduDrive(code:p2hl)
Learnable tensor, DCT initialization	78.352	GoogleDrive/BaiduDrive(code:txje)
Fixed tensor, random initialization	77.742	GoogleDrive/BaiduDrive(code:g5t9)
Fixed tensor, DCT initialization (Ours)	78.574	GoogleDrive/BaiduDrive(code:mgkk)

To verify this results, one can select the cooresponding types of tensor in the L73-L83 in model/layer.py, uncomment it and train the whole network.

TODO

Object detection models
Instance segmentation models
Fix the incorrect results of detection models
Make the switching between configs more easier

Comments

About the performance on cifar10 or cifar100.

Thanks for your work!!

Have you tried using fcanet to train classification tasks on cifar10 or cifar100?. If you have tried, what is the frequency components setting?

opened by NNNNAI 14
有关self.dct_h和self.dct_w的设置？

在这个类中MultiSpectralAttentionLayer有以下部分。 if h != self.dct_h or w != self.dct_w: x_pooled = torch.nn.functional.adaptive_avg_pool2d(x, (self.dct_h, self.dct_w)) # If you have concerns about one-line-change, don't worry. :) # In the ImageNet models, this line will never be triggered. # This is for compatibility in instance segmentation and object detection.

如果我的任务是目标检测，我该怎么设置self.dct_h和self.dct_w？

opened by XFR1998 6
2d dct FLOPs computing method

Hi, I noticed that in your paper you computed FCAnet model FLOPs.

I wonder how do you compute the FLOPs of 2d dct? Could you provide your formula or code?

Thanks!

opened by TianhaoFu 5
What's the difference between FcaBottleneck and FcaBasicBlock ?

As in your code, the FcaBottleneck expansion is 4 and FcaBasicBlock is 1, FcaBottleneck has one more layer of convolution than FcaBasicBlock, so how should I choose which module to use ?

opened by meiguoofa 3
关于通道分组

你好，我是一名深度学习初学者，我添加了两个FCA模块使原模型的mIOU提升了2.3，效果很好；然而对于通道分组，我有一些其他的看法；如果分组的通道中表示不同的信息，每个分组再使用不同的频率分量，这似乎会造成更多的信息丢失吧，因为DCT可以看作是一种加权和，可以从论文中看到除了GAP是对每个通道上像素的一视同仁，其他的都是对空间上某一个或几个部分注意的更多，这显然是存在偏颇的，这似乎也能解释为什么单个频率分量实验中GAP的效果最好；在这种情况下，对通道进行分组，或许会造成更多的信息损失？我仔细思考了下，我认为FCAwork的原因主要是存在通道冗余以及DCT加权形成的一种“互补” 因为存在通道冗余，进行通道分组时可能某些分组中的信息相近，并且这些分组的权重是“互补”的，比如一个权重矩阵更注重左半边，一个更注重右半边这样。似乎模块学习这种‘稀疏’的关系效果会更好。可以认为FAC比SE更充分的使用了冗余的通道。考虑了两个实验来证明，不对减小输入的通道数，将FCA与原模型或是SE进行对比，当通道减少到一定程度时，信息没有那么冗余，这时应该会有大量的信息丢失，精度相较于原模型更低；关于频率分量的选择，选取某些“对称”“互补”的权重矩阵，而不是通过单个频率分量的性能的来选择，并且去除那些"混乱”的权重矩阵，因为单个频率分量证明这种混乱的权重并没有简单分块的效果好另外可以在大通道数使用大的分组，在小通道数使用小的分组，来检验是否会获得更好的性能

不能完全表达我的意思，如有错误，恳请指出！

opened by Asthestarsfalll 2
跑您的模型的时候遇到的一些问题

您好，非常欣赏您的idea，所以尝试跑一下您的分类模型。我下载了ImageNet2012数据集之后，尝试启动您的模型，遇到了以下问题，想请教一下是否我的哪些设置出错了？

错误信息如下： Traceback (most recent call last): File "main.py", line 643, in main() File "main.py", line 389, in main avg_train_time = train(train_loader, model, criterion, optimizer, epoch, logger, scheduler) File "main.py", line 471, in train prec1, prec5 = accuracy(output.data, target, topk=(1, 5)) File "main.py", line 631, in accuracy correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

opened by LihuiNb 2

selecting frequency components

Hi, I want to know how did you select the frequency components like Figure6? I want to select 1, 3, 6, 10 frequencies like zigzag DCT.

And, I want to know the meaning of the numbers in the layer.py.

num_freq = int(method[3:])
if 'top' in method:
    all_top_indices_x = [0,0,6,0,0,1,1,4,5,1,3,0,0,0,3,2,4,6,3,5,5,2,6,5,5,3,3,4,2,2,6,1]
    all_top_indices_y = [0,1,0,5,2,0,2,0,0,6,0,4,6,3,5,2,6,3,3,3,5,1,1,2,4,2,1,1,3,0,5,3]
    mapper_x = all_top_indices_x[:num_freq]
    mapper_y = all_top_indices_y[:num_freq]
elif 'low' in method:
    all_low_indices_x = [0,0,1,1,0,2,2,1,2,0,3,4,0,1,3,0,1,2,3,4,5,0,1,2,3,4,5,6,1,2,3,4]
    all_low_indices_y = [0,1,0,1,2,0,1,2,2,3,0,0,4,3,1,5,4,3,2,1,0,6,5,4,3,2,1,0,6,5,4,3]
    mapper_x = all_low_indices_x[:num_freq]
    mapper_y = all_low_indices_y[:num_freq]
elif 'bot' in method:
    all_bot_indices_x = [6,1,3,3,2,4,1,2,4,4,5,1,4,6,2,5,6,1,6,2,2,4,3,3,5,5,6,2,5,5,3,6]
    all_bot_indices_y = [6,4,4,6,6,3,1,4,4,5,6,5,2,2,5,1,4,3,5,0,3,1,1,2,4,2,1,1,5,3,3,3]
    mapper_x = all_bot_indices_x[:num_freq]
    mapper_y = all_bot_indices_y[:num_freq]
else:
    raise NotImplementedError
return mapper_x, mapper_y

opened by InukKang 1

不大一致

在layer.py中有： class MultiSpectralAttentionLayer(torch.nn.Module):中有 self.dct_layer = MultiSpectralDCTLayer(dct_h, dct_w, mapper_x, mapper_y, channel) 可见dct_h在前， dct_w在后就是h在前，w在后而在class MultiSpectralDCTLayer(nn.Module):中 def init(self, width, height, mapper_x, mapper_y, channel): 可见 width在前，height在后，就是w在前，h在后请问这有什么说处么？我晕了

opened by desertfex 1
dct_h and dct_w

How can I set dct_h and dct_w if i want to add FCA layer into another model. My feature maps for the layer I want to inset Fca layer are 160x160, 80x80, 40x40, 20x20

Please advise.

opened by myasser63 5

想请问一下代码中bot是怎么选取的代表什么意思

elif 'bot' in method:
    all_bot_indices_x = [6,1,3,3,2,4,1,2,4,4,5,1,4,6,2,5,6,1,6,2,2,4,3,3,5,5,6,2,5,5,3,6]
    all_bot_indices_y = [6,4,4,6,6,3,1,4,4,5,6,5,2,2,5,1,4,3,5,0,3,1,1,2,4,2,1,1,5,3,3,3]

opened by Liutingjin 1

Releases(v1.0)

v1.0(Feb 3, 2021)

upload pretrained models.
Source code(tar.gz)
Source code(zip)
fca101.pth(222.61 MB)
fca152.pth(305.59 MB)
fca34.pth(89.11 MB)
fca50.pth(128.39 MB)

Owner

GitHub Repository

FastyAPI is a Stack boilerplate optimised for heavy loads.

FastyAPI A FastAPI based Stack boilerplate for heavy loads. Explore the docs » View Demo · Report Bug · Request Feature Table of Contents About The Pr

47 Dec 27, 2022

STRIVE: Scene Text Replacement In Videos

STRIVE: Scene Text Replacement In Videos Dataset Types: RoboText SynthText RealWorld videos RoboText : Videos of texts collected using navigation robo

15 Jul 11, 2022

Cross-Document Coreference Resolution

Cross-Document Coreference Resolution This repository contains code and models for end-to-end cross-document coreference resolution, as decribed in ou

29 Nov 28, 2022

Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

139 Dec 21, 2022

Fast convergence of detr with spatially modulated co-attention

Fast convergence of detr with spatially modulated co-attention Usage There are no extra compiled components in SMCA DETR and package dependencies are

135 Dec 07, 2022

This repository contains the implementation of the paper Contrastive Instance Association for 4D Panoptic Segmentation using Sequences of 3D LiDAR Scans

Contrastive Instance Association for 4D Panoptic Segmentation using Sequences of 3D LiDAR Scans This repository contains the implementation of the pap

40 Dec 01, 2022

code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"

PyTorch implementation of UAGAN(U-net Attention Generative Adversarial Networks) This repository contains the source code for the paper "A High-precis

8 Apr 25, 2022

Generative Flow Networks

Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation Implementation for our paper, submitted to NeurIPS 2021 (also chec

381 Jan 04, 2023

Simple codebase for flexible neural net training

neural-modular Simple codebase for flexible neural net training. Allows for seamless exchange of models, dataset, and optimizers. Uses hydra for confi

7 Apr 05, 2022

Self-Supervised Pillar Motion Learning for Autonomous Driving (CVPR 2021)

Self-Supervised Pillar Motion Learning for Autonomous Driving Chenxu Luo, Xiaodong Yang, Alan Yuille Self-Supervised Pillar Motion Learning for Autono

101 Dec 05, 2022

Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Knover Knover is a toolkit for knowledge grounded dialogue generation based on PaddlePaddle. Knover allows researchers and developers to carry out eff

607 Dec 31, 2022

🛠️ SLAMcore SLAM Utilities

slamcore_utils Description This repo contains the slamcore-setup-dataset script. It can be used for installing a sample dataset for offline testing an

7 Aug 04, 2022

Code for ViTAS_Vision Transformer Architecture Search

Vision Transformer Architecture Search This repository open source the code for ViTAS: Vision Transformer Architecture Search. ViTAS aims to search fo

46 Dec 17, 2022

A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

Segnet is deep fully convolutional neural network architecture for semantic pixel-wise segmentation. This is implementation of http://arxiv.org/pdf/15

190 Dec 15, 2022

Official Pytorch Implementation of Unsupervised Image Denoising with Frequency Domain Knowledge

Unsupervised Image Denoising with Frequency Domain Knowledge (BMVC 2021 Oral) : Official Project Page This repository provides the official PyTorch im

12 Sep 26, 2022

A PyTorch implementation of "Semi-Supervised Graph Classification: A Hierarchical Graph Perspective" (WWW 2019)

SEAL ⠀⠀⠀ A PyTorch implementation of Semi-Supervised Graph Classification: A Hierarchical Graph Perspective (WWW 2019) Abstract Node classification an

202 Dec 27, 2022

Official implementation of the ICLR 2021 paper

You Only Need Adversarial Supervision for Semantic Image Synthesis Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial S

272 Dec 28, 2022

Twin-deep neural network for semi-supervised learning of materials properties

Deep Semi-Supervised Teacher-Student Material Synthesizability Prediction Citation: Semi-supervised teacher-student deep neural network for materials

3 Dec 14, 2022

Automatic Calibration for Non-repetitive Scanning Solid-State LiDAR and Camera Systems

ACSC Automatic extrinsic calibration for non-repetitive scanning solid-state LiDAR and camera systems. System Architecture 1. Dependency Tested with U

192 Dec 13, 2022

Official implementation of the paper "Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering"

Light Field Networks Project Page | Paper | Data | Pretrained Models Vincent Sitzmann*, Semon Rezchikov*, William Freeman, Joshua Tenenbaum, Frédo Dur

130 Dec 29, 2022

FcaNet: Frequency Channel Attention Networks

Related tags

Overview

FcaNet: Frequency Channel Attention Networks

Simplest usage

Install

Models

Classification models on ImageNet

Detection and instance segmentation models on COCO

Training

Testing

FAQ

TODO

Comments

Releases(v1.0)

v1.0(Feb 3, 2021)

Owner

FastyAPI is a Stack boilerplate optimised for heavy loads.

STRIVE: Scene Text Replacement In Videos

Cross-Document Coreference Resolution

Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

Fast convergence of detr with spatially modulated co-attention

This repository contains the implementation of the paper Contrastive Instance Association for 4D Panoptic Segmentation using Sequences of 3D LiDAR Scans

code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"

Generative Flow Networks

Simple codebase for flexible neural net training

Self-Supervised Pillar Motion Learning for Autonomous Driving (CVPR 2021)

Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

🛠️ SLAMcore SLAM Utilities

Code for ViTAS_Vision Transformer Architecture Search

A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

Official Pytorch Implementation of Unsupervised Image Denoising with Frequency Domain Knowledge

A PyTorch implementation of "Semi-Supervised Graph Classification: A Hierarchical Graph Perspective" (WWW 2019)

Official implementation of the ICLR 2021 paper

Twin-deep neural network for semi-supervised learning of materials properties

Automatic Calibration for Non-repetitive Scanning Solid-State LiDAR and Camera Systems

Official implementation of the paper "Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering"