Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Last update: Jan 04, 2023

Overview

Updates

(2020/06/21) Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Pyramid Vision Transformer

The image is from Transformers: Revenge of the Fallen.

This repository contains the official implementation of PVTv1 & PVTv2 in image classification, object detection, and semantic segmentation tasks.

Model Zoo

Image Classification

Classification configs & weights see >>>here<<<.

PVTv2 on ImageNet-1K

Method	Size	[email protected]	#Params (M)
PVTv2-B0	224	70.5	3.7
PVTv2-B1	224	78.7	14.0
PVTv2-B2-Linear	224	82.1	22.6
PVTv2-B2	224	82.0	25.4
PVTv2-B3	224	83.1	45.2
PVTv2-B4	224	83.6	62.6
PVTv2-B5	224	83.8	82.0

PVTv1 on ImageNet-1K

Method	Size	[email protected]	#Params (M)
PVT-Tiny	224	75.1	13.2
PVT-Small	224	79.8	24.5
PVT-Medium	224	81.2	44.2
PVT-Large	224	81.7	61.4

Object Detection

Detection configs & weights see >>>here<<<.

PVTv2 on COCO

Baseline Detectors

Method	Backbone	Pretrain	Lr schd	Aug	box AP	mask AP
RetinaNet	PVTv2-b0	ImageNet-1K	1x	No	37.2	-
RetinaNet	PVTv2-b1	ImageNet-1K	1x	No	41.2	-
RetinaNet	PVTv2-b2	ImageNet-1K	1x	No	44.6	-
RetinaNet	PVTv2-b3	ImageNet-1K	1x	No	45.9	-
RetinaNet	PVTv2-b4	ImageNet-1K	1x	No	46.1	-
RetinaNet	PVTv2-b5	ImageNet-1K	1x	No	46.2	-
Mask R-CNN	PVTv2-b0	ImageNet-1K	1x	No	38.2	36.2
Mask R-CNN	PVTv2-b1	ImageNet-1K	1x	No	41.8	38.8
Mask R-CNN	PVTv2-b2	ImageNet-1K	1x	No	45.3	41.2
Mask R-CNN	PVTv2-b3	ImageNet-1K	1x	No	47.0	42.5
Mask R-CNN	PVTv2-b4	ImageNet-1K	1x	No	47.5	42.7
Mask R-CNN	PVTv2-b5	ImageNet-1K	1x	No	47.4	42.5

Advanced Detectors

Method	Backbone	Pretrain	Lr schd	Aug	box AP	mask AP
Cascade Mask R-CNN	PVTv2-b2-Linear	ImageNet-1K	3x	Yes	50.9	44.0
Cascade Mask R-CNN	PVTv2-b2	ImageNet-1K	3x	Yes	51.1	44.4
ATSS	PVTv2-b2-Linear	ImageNet-1K	3x	Yes	48.9	-
ATSS	PVTv2-b2	ImageNet-1K	3x	Yes	49.9	-
GFL	PVTv2-b2-Linear	ImageNet-1K	3x	Yes	49.2	-
GFL	PVTv2-b2	ImageNet-1K	3x	Yes	50.2	-
Sparse R-CNN	PVTv2-b2-Linear	ImageNet-1K	3x	Yes	48.9	-
Sparse R-CNN	PVTv2-b2	ImageNet-1K	3x	Yes	50.1	-

PVTv1 on COCO

Detector	Backbone	Pretrain	Lr schd	box AP	mask AP
RetinaNet	PVT-Tiny	ImageNet-1K	1x	36.7	-
RetinaNet	PVT-Small	ImageNet-1K	1x	40.4	-
Mask RCNN	PVT-Tiny	ImageNet-1K	1x	36.7	35.1
Mask RCNN	PVT-Small	ImageNet-1K	1x	40.4	37.8
DETR	PVT-Small	ImageNet-1K	50ep	34.7	-

Semantic Segmentation

Segmentation configs & weights see >>>here<<<.

PVTv1 on ADE20K

Method	Backbone	Pretrain	Iters	mIoU
Semantic FPN	PVT-Tiny	ImageNet-1K	40K	35.7
Semantic FPN	PVT-Small	ImageNet-1K	40K	39.8
Semantic FPN	PVT-Medium	ImageNet-1K	40K	41.6
Semantic FPN	PVT-Large	ImageNet-1K	40K	42.1

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Citation

If you use this code for a paper, please cite:

PVTv1

@misc{wang2021pyramid,
      title={Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions}, 
      author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
      year={2021},
      eprint={2102.12122},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

PVTv2

@misc{wang2021pvtv2,
      title={PVTv2: Improved Baselines with Pyramid Vision Transformer}, 
      author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
      year={2021},
      eprint={2106.13797},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

This repo is currently maintained by Wenhai Wang (@whai362), Enze Xie (@xieenze), and Zhe Chen (@czczup).

Comments

Mask R-CNN configs

Hi, thank you for your great work! Recently we would like to compare your model with ours on the Mask R-CNN results. I wonder if you can provide some configs for Mask R-CNN settings? Thanks!

opened by xwjabc 10
semantic segmentation code

Hi,thaks for your excellent work!!! I have read your paper named 'Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions', and want to apply it in my work in semantic segmentation, When will you make the semantic segmentation code and models public?

opened by hgmlu 8
About FLOPs calculation in Table 2

Hi Wenhai, thanks for this great work.

I have few questions about the FLOPs calculation in this paper. Previously I tested the DeiT models with ptflops, I got 2.51G, 9.20G, 35.13G FLOPs for DeiT-Tiny, DeiT-Small, DeiT-Base, respectively.

B.T.W I also included the matrix multiplications in the self-attention layer, namely q @ k and attn @ v. I assume there is something wrong with my calculation, may I know how do you calculate FLOPs for your experiments?

Thanks.

opened by HubHop 6
tkinter.tclerror

thanks for your work. i test demo.py and face this problems: if i comment out model.show_result, can obtain the result normally. Traceback (most recent call last): File "demo.py", line 62, in main(args) File "demo.py", line 35, in main model.show_result( File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 327, in show_result img = imshow_det_bboxes( File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/site-packages/mmdet/core/visualization/image.py", line 113, in imshow_det_bboxes fig = plt.figure(win_name, frameon=False) File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/site-packages/matplotlib/pyplot.py", line 687, in figure figManager = new_figure_manager(num, figsize=figsize, File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/site-packages/matplotlib/pyplot.py", line 315, in new_figure_manager return _backend_mod.new_figure_manager(*args, **kwargs) File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 3494, in new_figure_manager return cls.new_figure_manager_given_figure(num, fig) File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/site-packages/matplotlib/backends/_backend_tk.py", line 885, in new_figure_manager_given_figure window = tk.Tk(className="matplotlib") File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/tkinter/init.py", line 2261, in init self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use) _tkinter.TclError: couldn't connect to display "localhost:10.0"

opened by shengyuan-tang 4
how can i load pickle file?

thanks for sharing the code .. i'm trying to load pickle file to read it using these commands

import pickle infile = open('data.pkl','rb') new_dict = pickle.load(infile) infile.close() print(type(new_dict)) but error is _pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified. i searched for the solution but got that pickle file appears to be using advanced features that suggest it was never supposed to be directly loaded this way. can you help, please ?

opened by mathshangw 4
question for PVTv2: in the paper the reduction ratio is 7 in Linear SRA, but in the code is sr_ratios=[8, 4, 2, 1]

question for PVTv2: in the paper the reduction ratio is 7 in Linear SRA, but in the code is sr_ratios=[8, 4, 2, 1],

Is there something wrong with my understanding?

opened by StormArcher 3
Low mAP on coco val

Hello, thx for your work. I was trying to train RetinaNet-FPN-PVTv2-B2-1x model on COCO2017, the reported mAP on val set is 44.6, but the results i got after training was only 33.5. Is there anything wrong?

I trained on 8 V100 GPUs using your provided pre-trained model pvt_v2_b2.pth. Training script was: ./dist_train.sh configs/retinanet_pvt_v2_b2_fpn_1x_coco.py 8

The config file was: model = dict( type='RetinaNet', pretrained='/opt/tiger/wanxingyu_tfm/pvt/pretrained/pvt_v2_b2.pth', backbone=dict( type='pvt_v2_b2', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch'), neck=dict( type='FPN', in_channels=[64, 128, 320, 512], out_channels=256, start_level=1, add_extra_convs='on_input', num_outs=5), bbox_head=dict( type='RetinaHead', num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', octave_base_scale=4, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[8, 16, 32, 64, 128]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), train_cfg=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, debug=False), test_cfg=dict( nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100)) dataset_type = 'CocoDataset' data_root = '/opt/tiger/wanxingyu_tfm/datasets/coco/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type='CocoDataset', ann_file= '/opt/tiger/wanxingyu_tfm/datasets/coco/annotations/instances_train2017.json', img_prefix='/opt/tiger/wanxingyu_tfm/datasets/coco/train2017/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ]), val=dict( type='CocoDataset', ann_file= '/opt/tiger/wanxingyu_tfm/datasets/coco/annotations/instances_val2017.json', img_prefix='/opt/tiger/wanxingyu_tfm/datasets/coco/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]), test=dict( type='CocoDataset', ann_file= '/opt/tiger/wanxingyu_tfm/datasets/coco/annotations/instances_val2017.json', img_prefix='/opt/tiger/wanxingyu_tfm/datasets/coco/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ])) evaluation = dict(interval=1, metric='bbox') optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001) optimizer_config = dict(grad_clip=None) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[8, 11]) runner = dict(type='EpochBasedRunner', max_epochs=12) checkpoint_config = dict(interval=1) log_config = dict(interval=5, hooks=[dict(type='TextLoggerHook')]) custom_hooks = [dict(type='NumClassCheckHook')] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] work_dir = './work_dirs/retinanet_pvt_v2_b2_fpn_1x_coco' gpu_ids = range(0, 8)

Test script was: ./dist_test.sh configs/retinanet_pvt_v2_b2_fpn_1x_coco.py work_dirs/retinanet_pvt_v2_b2_fpn_1x_coco/epoch_12.pth 8 --eval bbox

The result i got: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.335 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.514 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.352 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.190 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.356 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.450 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.525 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.525 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.525 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.325 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.561 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.683 OrderedDict([('bbox_mAP', 0.335), ('bbox_mAP_50', 0.514), ('bbox_mAP_75', 0.352), ('bbox_mAP_s', 0.19), ('bbox_mAP_m', 0.356), ('bbox_mAP_l', 0.45), ('bbox_mAP_copypaste', '0.335 0.514 0.352 0.190 0.356 0.450')])

opened by memorywxy 3
How can I get small_pvt.pth?

I run your main.py .. I'm confusing what this class do ? it gave me the accuracy for 500 epoch and loss of them right ? and when I tried to train my images by this command 'dist_train.sh configs/retinanet_pvt_s_fpn_1x_coco_640.py 1'

I got that small_pvt.pth not found .. excuse me does that will be the weights ? Or checkpoints ?

Does small_pvt.pth here https://drive.google.com/file/d/1vtcyoU8KUqNzktlMGXZrYcMRsNNiVZFQ/view?usp=sharing For imagenet ? But how can I got pth file.if the dataset.is different . Appreciating your reply. Thanks

opened by SamMohel 3
problems about loading pretrained model with pytorch version below 1.6

problems about loading pretrained model with pytorch version below 1.6

pytorch 1.6 have switched torch.save to use a zip file-based format by default rather than the old Pickle-based format. This cause pytorchs with version below 1.6 can not load the pretained models AT ALL.

Can you use "_use_new_zipfile_serialization=False" when using torch.save()? just like torch.save(m.state_dict(), 'mymodel.pt', _use_new_zipfile_serialization=False). And provide another version of pretrained models?

Thanks a lot!!!!

opened by WxWstranger 3
PVT Large deosn't converge
Thanks for your great work. But when I trained PVT Large (pvt_large) as your default settings, the model didn't converge. The loss declined correctly in the first 37 epochs and the accuracy went to 57% but the model went wrong at 38th epoch. I used your code without any change. What's the problem? Thank you!

Below is a part of my training log.

Test: Total time: 0:01:55 (0.4429 s / it)

[email protected] 57.009 [email protected] 81.174 loss 1.948 Accuracy of the network on the 50000 test images: 57.0% Max accuracy: 57.01% Epoch: [38] [ 0/1251] eta: 2:06:33 lr: 0.000963 loss: 4.9324 (4.9324) time: 6.0701 data: 3.6057 max mem: 25529 Epoch: [38] [ 10/1251] eta: 0:31:59 lr: 0.000963 loss: 4.5930 (4.5768) time: 1.5465 data: 0.3281 max mem: 25529 Epoch: [38] [ 20/1251] eta: 0:27:07 lr: 0.000963 loss: 4.6624 (4.6160) time: 1.0843 data: 0.0003 max mem: 25529 Epoch: [38] [ 30/1251] eta: 0:25:15 lr: 0.000963 loss: 4.7355 (4.5806) time: 1.0737 data: 0.0003 max mem: 25529 Epoch: [38] [ 40/1251] eta: 0:24:16 lr: 0.000963 loss: 4.6986 (4.5811) time: 1.0784 data: 0.0003 max mem: 25529 Epoch: [38] [ 50/1251] eta: 0:23:33 lr: 0.000963 loss: 4.6986 (4.5609) time: 1.0766 data: 0.0003 max mem: 25529 Epoch: [38] [ 60/1251] eta: 0:23:07 lr: 0.000963 loss: 4.7104 (4.5901) time: 1.0864 data: 0.0003 max mem: 25529 Epoch: [38] [ 70/1251] eta: 0:22:39 lr: 0.000963 loss: 4.8095 (4.6143) time: 1.0854 data: 0.0003 max mem: 25529 Epoch: [38] [ 80/1251] eta: 0:22:17 lr: 0.000963 loss: 4.7373 (4.5898) time: 1.0721 data: 0.0003 max mem: 25529 Epoch: [38] [ 90/1251] eta: 0:21:55 lr: 0.000963 loss: 4.4603 (4.5742) time: 1.0696 data: 0.0003 max mem: 25529 Epoch: [38] [ 100/1251] eta: 0:21:37 lr: 0.000963 loss: 4.5539 (4.5777) time: 1.0682 data: 0.0003 max mem: 25529 Epoch: [38] [ 110/1251] eta: 0:21:21 lr: 0.000963 loss: 4.9701 (4.5993) time: 1.0787 data: 0.0003 max mem: 25529 Epoch: [38] [ 120/1251] eta: 0:21:06 lr: 0.000963 loss: 4.9029 (4.5914) time: 1.0811 data: 0.0003 max mem: 25529 Epoch: [38] [ 130/1251] eta: 0:20:50 lr: 0.000963 loss: 4.7300 (4.5999) time: 1.0711 data: 0.0003 max mem: 25529 Epoch: [38] [ 140/1251] eta: 0:20:35 lr: 0.000963 loss: 4.7998 (4.5936) time: 1.0630 data: 0.0003 max mem: 25529 Epoch: [38] [ 150/1251] eta: 0:20:23 lr: 0.000963 loss: 4.8562 (4.5969) time: 1.0850 data: 0.0003 max mem: 25529 Epoch: [38] [ 160/1251] eta: 0:20:09 lr: 0.000963 loss: 4.8583 (4.5961) time: 1.0852 data: 0.0003 max mem: 25529 Epoch: [38] [ 170/1251] eta: 0:19:55 lr: 0.000963 loss: 4.8583 (4.6029) time: 1.0677 data: 0.0003 max mem: 25529 Epoch: [38] [ 180/1251] eta: 0:19:42 lr: 0.000963 loss: 5.0298 (4.6202) time: 1.0675 data: 0.0003 max mem: 25529 Epoch: [38] [ 190/1251] eta: 0:19:28 lr: 0.000963 loss: 4.8480 (4.6175) time: 1.0634 data: 0.0003 max mem: 25529 Epoch: [38] [ 200/1251] eta: 0:19:15 lr: 0.000963 loss: 4.6446 (4.6124) time: 1.0629 data: 0.0003 max mem: 25529 Epoch: [38] [ 210/1251] eta: 0:19:04 lr: 0.000963 loss: 4.8329 (4.6245) time: 1.0741 data: 0.0003 max mem: 25529 Epoch: [38] [ 220/1251] eta: 0:18:52 lr: 0.000963 loss: 4.9058 (4.6362) time: 1.0833 data: 0.0003 max mem: 25529 Epoch: [38] [ 230/1251] eta: 0:18:40 lr: 0.000963 loss: 4.7250 (4.6332) time: 1.0764 data: 0.0003 max mem: 25529 Epoch: [38] [ 240/1251] eta: 0:18:28 lr: 0.000963 loss: 4.6894 (4.6391) time: 1.0808 data: 0.0003 max mem: 25529 Epoch: [38] [ 250/1251] eta: 0:18:16 lr: 0.000963 loss: 4.8600 (4.6438) time: 1.0789 data: 0.0003 max mem: 25529 Epoch: [38] [ 260/1251] eta: 0:18:04 lr: 0.000963 loss: 4.9939 (4.6550) time: 1.0710 data: 0.0003 max mem: 25529 Epoch: [38] [ 270/1251] eta: 0:17:53 lr: 0.000963 loss: 4.7281 (4.6478) time: 1.0717 data: 0.0003 max mem: 25529 Epoch: [38] [ 280/1251] eta: 0:17:41 lr: 0.000963 loss: 4.3858 (4.6383) time: 1.0664 data: 0.0003 max mem: 25529 Epoch: [38] [ 290/1251] eta: 0:17:29 lr: 0.000963 loss: 4.5126 (4.6390) time: 1.0627 data: 0.0003 max mem: 25529 Epoch: [38] [ 300/1251] eta: 0:17:17 lr: 0.000963 loss: 4.3964 (4.6302) time: 1.0638 data: 0.0003 max mem: 25529 Epoch: [38] [ 310/1251] eta: 0:17:05 lr: 0.000963 loss: 4.3964 (4.6284) time: 1.0683 data: 0.0003 max mem: 25529 Epoch: [38] [ 320/1251] eta: 0:16:54 lr: 0.000963 loss: 4.4917 (4.6220) time: 1.0689 data: 0.0003 max mem: 25529 Epoch: [38] [ 330/1251] eta: 0:16:42 lr: 0.000963 loss: 4.7606 (4.6335) time: 1.0695 data: 0.0003 max mem: 25529 Epoch: [38] [ 340/1251] eta: 0:16:31 lr: 0.000963 loss: 5.0333 (4.6346) time: 1.0699 data: 0.0003 max mem: 25529 Epoch: [38] [ 350/1251] eta: 0:16:20 lr: 0.000963 loss: 4.6795 (4.6276) time: 1.0700 data: 0.0003 max mem: 25529 Epoch: [38] [ 360/1251] eta: 0:16:08 lr: 0.000963 loss: 4.7723 (4.6305) time: 1.0728 data: 0.0003 max mem: 25529 Epoch: [38] [ 370/1251] eta: 0:15:57 lr: 0.000963 loss: 4.8322 (4.6305) time: 1.0767 data: 0.0003 max mem: 25529 Epoch: [38] [ 380/1251] eta: 0:15:46 lr: 0.000963 loss: 4.7535 (4.6310) time: 1.0725 data: 0.0003 max mem: 25529 Epoch: [38] [ 390/1251] eta: 0:15:35 lr: 0.000963 loss: 4.5236 (4.6247) time: 1.0746 data: 0.0003 max mem: 25529 Epoch: [38] [ 400/1251] eta: 0:15:24 lr: 0.000963 loss: 4.5129 (4.6280) time: 1.0783 data: 0.0003 max mem: 25529 Epoch: [38] [ 410/1251] eta: 0:15:13 lr: 0.000963 loss: 4.6520 (4.6250) time: 1.0803 data: 0.0003 max mem: 25529 Epoch: [38] [ 420/1251] eta: 0:15:02 lr: 0.000963 loss: 4.6115 (4.6235) time: 1.0841 data: 0.0003 max mem: 25529 Epoch: [38] [ 430/1251] eta: 0:14:51 lr: 0.000963 loss: 4.5550 (4.6176) time: 1.0788 data: 0.0003 max mem: 25529 Epoch: [38] [ 440/1251] eta: 0:14:40 lr: 0.000963 loss: 4.3985 (4.6097) time: 1.0745 data: 0.0003 max mem: 25529 Epoch: [38] [ 450/1251] eta: 0:14:29 lr: 0.000963 loss: 4.5041 (4.6144) time: 1.0711 data: 0.0004 max mem: 25529 Epoch: [38] [ 460/1251] eta: 0:14:18 lr: 0.000963 loss: 4.7949 (4.6127) time: 1.0769 data: 0.0003 max mem: 25529 Epoch: [38] [ 470/1251] eta: 0:14:07 lr: 0.000963 loss: 4.7556 (4.6148) time: 1.0773 data: 0.0003 max mem: 25529 Epoch: [38] [ 480/1251] eta: 0:13:56 lr: 0.000963 loss: 5.0523 (4.6200) time: 1.0845 data: 0.0003 max mem: 25529 Epoch: [38] [ 490/1251] eta: 0:13:45 lr: 0.000963 loss: 4.5865 (4.6152) time: 1.0781 data: 0.0003 max mem: 25529 Epoch: [38] [ 500/1251] eta: 0:13:34 lr: 0.000963 loss: 4.6311 (4.6210) time: 1.0776 data: 0.0003 max mem: 25529 Epoch: [38] [ 510/1251] eta: 0:13:23 lr: 0.000963 loss: 4.8767 (4.6208) time: 1.0855 data: 0.0003 max mem: 25529 Epoch: [38] [ 520/1251] eta: 0:13:13 lr: 0.000963 loss: 4.7439 (4.6204) time: 1.0891 data: 0.0003 max mem: 25529 Epoch: [38] [ 530/1251] eta: 0:13:02 lr: 0.000963 loss: 4.7974 (4.6190) time: 1.0813 data: 0.0003 max mem: 25529 Epoch: [38] [ 540/1251] eta: 0:12:51 lr: 0.000963 loss: 4.6865 (4.6171) time: 1.0676 data: 0.0003 max mem: 25529 Epoch: [38] [ 550/1251] eta: 0:12:40 lr: 0.000963 loss: 4.4560 (4.6144) time: 1.0727 data: 0.0003 max mem: 25529 Epoch: [38] [ 560/1251] eta: 0:12:29 lr: 0.000963 loss: 4.2302 (4.6069) time: 1.0761 data: 0.0003 max mem: 25529 Epoch: [38] [ 570/1251] eta: 0:12:18 lr: 0.000963 loss: 4.3246 (4.6080) time: 1.0741 data: 0.0003 max mem: 25529 Epoch: [38] [ 580/1251] eta: 0:12:07 lr: 0.000963 loss: 4.5513 (4.6052) time: 1.0661 data: 0.0003 max mem: 25529 Epoch: [38] [ 590/1251] eta: 0:11:56 lr: 0.000963 loss: 4.4924 (4.6075) time: 1.0740 data: 0.0003 max mem: 25529 Epoch: [38] [ 600/1251] eta: 0:11:45 lr: 0.000963 loss: 4.5949 (4.6052) time: 1.0817 data: 0.0003 max mem: 25529 Epoch: [38] [ 610/1251] eta: 0:11:34 lr: 0.000963 loss: 4.5321 (4.6035) time: 1.0638 data: 0.0003 max mem: 25529 Epoch: [38] [ 620/1251] eta: 0:11:23 lr: 0.000963 loss: 4.7689 (4.6075) time: 1.0604 data: 0.0003 max mem: 25529 Epoch: [38] [ 630/1251] eta: 0:11:12 lr: 0.000963 loss: 4.7689 (4.6088) time: 1.0649 data: 0.0003 max mem: 25529 Epoch: [38] [ 640/1251] eta: 0:11:01 lr: 0.000963 loss: 4.4721 (4.6039) time: 1.0580 data: 0.0003 max mem: 25529 Epoch: [38] [ 650/1251] eta: 0:10:50 lr: 0.000963 loss: 4.5410 (4.6067) time: 1.0654 data: 0.0003 max mem: 25529 Epoch: [38] [ 660/1251] eta: 0:10:39 lr: 0.000963 loss: 4.5659 (4.5996) time: 1.0689 data: 0.0003 max mem: 25529 Epoch: [38] [ 670/1251] eta: 0:10:28 lr: 0.000963 loss: 4.4456 (4.5999) time: 1.0727 data: 0.0003 max mem: 25529 Epoch: [38] [ 680/1251] eta: 0:10:17 lr: 0.000963 loss: 4.8766 (4.6035) time: 1.0818 data: 0.0003 max mem: 25529 Epoch: [38] [ 690/1251] eta: 0:10:06 lr: 0.000963 loss: 4.8766 (4.6041) time: 1.0854 data: 0.0003 max mem: 25529 Epoch: [38] [ 700/1251] eta: 0:09:55 lr: 0.000963 loss: 4.9327 (4.6104) time: 1.0805 data: 0.0003 max mem: 25529 Epoch: [38] [ 710/1251] eta: 0:09:44 lr: 0.000963 loss: 5.0049 (4.6129) time: 1.0702 data: 0.0003 max mem: 25529 Epoch: [38] [ 720/1251] eta: 0:09:34 lr: 0.000963 loss: 4.6922 (4.6117) time: 1.0673 data: 0.0003 max mem: 25529 Epoch: [38] [ 730/1251] eta: 0:09:23 lr: 0.000963 loss: 4.6331 (4.6107) time: 1.0810 data: 0.0003 max mem: 25529 Epoch: [38] [ 740/1251] eta: 0:09:12 lr: 0.000963 loss: 4.5547 (4.6111) time: 1.0795 data: 0.0003 max mem: 25529 Epoch: [38] [ 750/1251] eta: 0:09:01 lr: 0.000963 loss: 4.8843 (4.6181) time: 1.0719 data: 0.0003 max mem: 25529 Epoch: [38] [ 760/1251] eta: 0:08:50 lr: 0.000963 loss: 4.8843 (4.6160) time: 1.0851 data: 0.0003 max mem: 25529 Epoch: [38] [ 770/1251] eta: 0:08:40 lr: 0.000963 loss: 4.2934 (4.6119) time: 1.0840 data: 0.0003 max mem: 25529 Epoch: [38] [ 780/1251] eta: 0:08:29 lr: 0.000963 loss: 4.1930 (4.6087) time: 1.0784 data: 0.0003 max mem: 25529 Epoch: [38] [ 790/1251] eta: 0:08:18 lr: 0.000963 loss: 4.4176 (4.6073) time: 1.0748 data: 0.0003 max mem: 25529 Epoch: [38] [ 800/1251] eta: 0:08:07 lr: 0.000963 loss: 4.7402 (4.6115) time: 1.0681 data: 0.0003 max mem: 25529 Epoch: [38] [ 810/1251] eta: 0:07:56 lr: 0.000963 loss: 4.7749 (4.6094) time: 1.0713 data: 0.0003 max mem: 25529 Epoch: [38] [ 820/1251] eta: 0:07:45 lr: 0.000963 loss: 4.6709 (4.6079) time: 1.0732 data: 0.0003 max mem: 25529 Epoch: [38] [ 830/1251] eta: 0:07:34 lr: 0.000963 loss: 4.7506 (4.6088) time: 1.0641 data: 0.0003 max mem: 25529 Epoch: [38] [ 840/1251] eta: 0:07:23 lr: 0.000963 loss: 4.8636 (4.6112) time: 1.0592 data: 0.0003 max mem: 25529 Epoch: [38] [ 850/1251] eta: 0:07:13 lr: 0.000963 loss: 4.9930 (4.6116) time: 1.0767 data: 0.0003 max mem: 25529 Epoch: [38] [ 860/1251] eta: 0:07:02 lr: 0.000963 loss: 5.0639 (4.6155) time: 1.0766 data: 0.0003 max mem: 25529 Epoch: [38] [ 870/1251] eta: 0:06:51 lr: 0.000963 loss: 5.0486 (4.6160) time: 1.0683 data: 0.0003 max mem: 25529 Epoch: [38] [ 880/1251] eta: 0:06:40 lr: 0.000963 loss: 4.6785 (4.6145) time: 1.0654 data: 0.0003 max mem: 25529 Epoch: [38] [ 890/1251] eta: 0:06:29 lr: 0.000963 loss: 4.6382 (4.6126) time: 1.0603 data: 0.0003 max mem: 25529 Epoch: [38] [ 900/1251] eta: 0:06:18 lr: 0.000963 loss: 4.9989 (4.6179) time: 1.0642 data: 0.0003 max mem: 25529 Epoch: [38] [ 910/1251] eta: 0:06:08 lr: 0.000963 loss: 5.0227 (4.6205) time: 1.0740 data: 0.0003 max mem: 25529 Epoch: [38] [ 920/1251] eta: 0:05:57 lr: 0.000963 loss: 4.7505 (4.6198) time: 1.0733 data: 0.0003 max mem: 25529 Epoch: [38] [ 930/1251] eta: 0:05:46 lr: 0.000963 loss: 4.6593 (4.6196) time: 1.0636 data: 0.0003 max mem: 25529 Epoch: [38] [ 940/1251] eta: 0:05:35 lr: 0.000963 loss: 4.7349 (4.6184) time: 1.0697 data: 0.0003 max mem: 25529 Epoch: [38] [ 950/1251] eta: 0:05:24 lr: 0.000963 loss: 4.8424 (4.6185) time: 1.0741 data: 0.0003 max mem: 25529 Epoch: [38] [ 960/1251] eta: 0:05:13 lr: 0.000963 loss: 4.5308 (4.6170) time: 1.0704 data: 0.0003 max mem: 25529 Epoch: [38] [ 970/1251] eta: 0:05:03 lr: 0.000963 loss: 4.6764 (4.6186) time: 1.0749 data: 0.0003 max mem: 25529 Epoch: [38] [ 980/1251] eta: 0:04:52 lr: 0.000963 loss: 4.6764 (4.6176) time: 1.0768 data: 0.0004 max mem: 25529 Epoch: [38] [ 990/1251] eta: 0:04:41 lr: 0.000963 loss: 4.5145 (4.6176) time: 1.0677 data: 0.0004 max mem: 25529 Epoch: [38] [1000/1251] eta: 0:04:30 lr: 0.000963 loss: 4.5645 (4.6202) time: 1.0686 data: 0.0003 max mem: 25529 Epoch: [38] [1010/1251] eta: 0:04:19 lr: 0.000963 loss: 5.3548 (4.6373) time: 1.0613 data: 0.0003 max mem: 25529 Epoch: [38] [1020/1251] eta: 0:04:09 lr: 0.000963 loss: 6.9353 (4.6599) time: 1.0595 data: 0.0003 max mem: 25529 Epoch: [38] [1030/1251] eta: 0:03:58 lr: 0.000963 loss: 6.9423 (4.6820) time: 1.0729 data: 0.0003 max mem: 25529 Epoch: [38] [1040/1251] eta: 0:03:47 lr: 0.000963 loss: 6.9381 (4.7036) time: 1.0715 data: 0.0003 max mem: 25529 Epoch: [38] [1050/1251] eta: 0:03:36 lr: 0.000963 loss: 6.9351 (4.7248) time: 1.0717 data: 0.0003 max mem: 25529 Epoch: [38] [1060/1251] eta: 0:03:25 lr: 0.000963 loss: 6.9315 (4.7456) time: 1.0655 data: 0.0003 max mem: 25529 Epoch: [38] [1070/1251] eta: 0:03:15 lr: 0.000963 loss: 6.9319 (4.7660) time: 1.0609 data: 0.0003 max mem: 25529 Epoch: [38] [1080/1251] eta: 0:03:04 lr: 0.000963 loss: 6.9287 (4.7860) time: 1.0717 data: 0.0003 max mem: 25529 Epoch: [38] [1090/1251] eta: 0:02:53 lr: 0.000963 loss: 6.9198 (4.8055) time: 1.0834 data: 0.0003 max mem: 25529 Epoch: [38] [1100/1251] eta: 0:02:42 lr: 0.000963 loss: 6.9219 (4.8248) time: 1.0835 data: 0.0003 max mem: 25529 Epoch: [38] [1110/1251] eta: 0:02:32 lr: 0.000963 loss: 6.9286 (4.8437) time: 1.1036 data: 0.0003 max mem: 25529 Epoch: [38] [1120/1251] eta: 0:02:21 lr: 0.000963 loss: 6.9209 (4.8622) time: 1.0965 data: 0.0003 max mem: 25529 Epoch: [38] [1130/1251] eta: 0:02:10 lr: 0.000963 loss: 6.9212 (4.8804) time: 1.0701 data: 0.0003 max mem: 25529 Epoch: [38] [1140/1251] eta: 0:01:59 lr: 0.000963 loss: 6.9192 (4.8983) time: 1.0686 data: 0.0003 max mem: 25529 Epoch: [38] [1150/1251] eta: 0:01:48 lr: 0.000963 loss: 6.9192 (4.9159) time: 1.0640 data: 0.0003 max mem: 25529 Epoch: [38] [1160/1251] eta: 0:01:38 lr: 0.000963 loss: 6.9231 (4.9332) time: 1.0687 data: 0.0003 max mem: 25529 Epoch: [38] [1170/1251] eta: 0:01:27 lr: 0.000963 loss: 6.9241 (4.9502) time: 1.0702 data: 0.0003 max mem: 25529 Epoch: [38] [1180/1251] eta: 0:01:16 lr: 0.000963 loss: 6.9240 (4.9669) time: 1.0687 data: 0.0003 max mem: 25529 Epoch: [38] [1190/1251] eta: 0:01:05 lr: 0.000963 loss: 6.9198 (4.9833) time: 1.0668 data: 0.0003 max mem: 25529 Epoch: [38] [1200/1251] eta: 0:00:54 lr: 0.000963 loss: 6.9150 (4.9993) time: 1.0864 data: 0.0003 max mem: 25529 Epoch: [38] [1210/1251] eta: 0:00:44 lr: 0.000963 loss: 6.9144 (5.0152) time: 1.0855 data: 0.0003 max mem: 25529 Epoch: [38] [1220/1251] eta: 0:00:33 lr: 0.000963 loss: 6.9167 (5.0308) time: 1.0714 data: 0.0003 max mem: 25529 Epoch: [38] [1230/1251] eta: 0:00:22 lr: 0.000963 loss: 6.9167 (5.0461) time: 1.0702 data: 0.0003 max mem: 25529 Epoch: [38] [1240/1251] eta: 0:00:11 lr: 0.000963 loss: 6.9135 (5.0612) time: 1.0574 data: 0.0005 max mem: 25529 Epoch: [38] [1250/1251] eta: 0:00:01 lr: 0.000963 loss: 6.9179 (5.0760) time: 1.0532 data: 0.0004 max mem: 25529 Epoch: [38] Total time: 0:22:28 (1.0781 s / it) Averaged stats: lr: 0.000963 loss: 6.9179 (5.0558) Test: [ 0/261] eta: 0:31:19 loss: 6.8103 (6.8103) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.0000) time: 7.2018 data: 6.7932 max mem: 25529 Test: [ 10/261] eta: 0:04:17 loss: 6.9766 (6.9290) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.0000) time: 1.0263 data: 0.6262 max mem: 25529 Test: [ 20/261] eta: 0:02:56 loss: 6.9750 (6.9375) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.0000) time: 0.4103 data: 0.0066 max mem: 25529 Test: [ 30/261] eta: 0:02:25 loss: 6.9495 (6.9457) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.0000) time: 0.4091 data: 0.0024 max mem: 25529 Test: [ 40/261] eta: 0:02:06 loss: 6.9158 (6.9258) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.6352) time: 0.4017 data: 0.0010 max mem: 25529 Test: [ 50/261] eta: 0:01:53 loss: 6.8871 (6.9364) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.5106) time: 0.3975 data: 0.0007 max mem: 25529 Test: [ 60/261] eta: 0:01:43 loss: 6.9326 (6.9323) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.4269) time: 0.3969 data: 0.0007 max mem: 25529 Test: [ 70/261] eta: 0:01:35 loss: 6.8942 (6.9268) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.3668) time: 0.3951 data: 0.0016 max mem: 25529 Test: [ 80/261] eta: 0:01:27 loss: 6.8974 (6.9259) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.3215) time: 0.3954 data: 0.0025 max mem: 25529 Test: [ 90/261] eta: 0:01:21 loss: 6.9066 (6.9268) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.2862) time: 0.3983 data: 0.0017 max mem: 25529 Test: [100/261] eta: 0:01:15 loss: 6.9556 (6.9323) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.2578) time: 0.3960 data: 0.0009 max mem: 25529 Test: [110/261] eta: 0:01:09 loss: 6.9268 (6.9298) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.2346) time: 0.3962 data: 0.0010 max mem: 25529 Test: [120/261] eta: 0:01:04 loss: 6.8970 (6.9270) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.2152) time: 0.4211 data: 0.0242 max mem: 25529 Test: [130/261] eta: 0:00:59 loss: 6.8970 (6.9251) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.1988) time: 0.4183 data: 0.0242 max mem: 25529 Test: [140/261] eta: 0:00:54 loss: 6.9251 (6.9268) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.3694) time: 0.3986 data: 0.0018 max mem: 25529 Test: [150/261] eta: 0:00:49 loss: 6.9534 (6.9264) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.3449) time: 0.4021 data: 0.0045 max mem: 25529 Test: [160/261] eta: 0:00:45 loss: 6.8927 (6.9243) acc1: 0.0000 (0.1617) acc5: 0.0000 (0.4852) time: 0.4124 data: 0.0182 max mem: 25529 Test: [170/261] eta: 0:00:40 loss: 6.8886 (6.9231) acc1: 0.0000 (0.1523) acc5: 0.0000 (0.4569) time: 0.4112 data: 0.0157 max mem: 25529 Test: [180/261] eta: 0:00:35 loss: 6.9188 (6.9233) acc1: 0.0000 (0.1439) acc5: 0.0000 (0.4316) time: 0.3997 data: 0.0016 max mem: 25529 Test: [190/261] eta: 0:00:31 loss: 6.9170 (6.9216) acc1: 0.0000 (0.1363) acc5: 0.0000 (0.4090) time: 0.4233 data: 0.0265 max mem: 25529 Test: [200/261] eta: 0:00:26 loss: 6.9137 (6.9224) acc1: 0.0000 (0.1296) acc5: 0.0000 (0.3887) time: 0.4463 data: 0.0536 max mem: 25529 Test: [210/261] eta: 0:00:22 loss: 6.9097 (6.9210) acc1: 0.0000 (0.1234) acc5: 0.0000 (0.3703) time: 0.5000 data: 0.1046 max mem: 25529 Test: [220/261] eta: 0:00:18 loss: 6.8762 (6.9184) acc1: 0.0000 (0.1178) acc5: 0.0000 (0.3535) time: 0.4731 data: 0.0773 max mem: 25529 Test: [230/261] eta: 0:00:13 loss: 6.8775 (6.9185) acc1: 0.0000 (0.1127) acc5: 0.0000 (0.4509) time: 0.3974 data: 0.0048 max mem: 25529 Test: [240/261] eta: 0:00:09 loss: 6.9246 (6.9183) acc1: 0.0000 (0.1081) acc5: 0.0000 (0.4322) time: 0.4009 data: 0.0050 max mem: 25529 Test: [250/261] eta: 0:00:04 loss: 6.9132 (6.9190) acc1: 0.0000 (0.1038) acc5: 0.0000 (0.5188) time: 0.3949 data: 0.0010 max mem: 25529 Test: [260/261] eta: 0:00:00 loss: 6.9128 (6.9180) acc1: 0.0000 (0.1000) acc5: 0.0000 (0.5000) time: 0.3788 data: 0.0001 max mem: 25529 Test: Total time: 0:01:54 (0.4370 s / it)

[email protected] 0.100 [email protected] 0.500 loss 6.918 Accuracy of the network on the 50000 test images: 0.1% Max accuracy: 57.01%
opened by VictorLlu 3
pretrained model load

hello~, i am very interested in your work. Now i meet some questions when the pretrained model was load checkpoint = torch.load(args.finetune, map_location='cpu')

debug: pos_embed_checkpoint = checkpoint_model['pos_embed'] the checkpiont have "pos_embed1" "pos_embed2" "pos_embed3" "pos_embed4", but no "pos_embed"

opened by surelyee 3
Why there is no DETR+PVTv2 in object detection?

I noticed that there is DETR+PVTv1, although its AP value is not satisfactory. Why is there no implementation of DETR+PVTv2? Is it ineffective or just not provided yet.

opened by yuhua666 0
Did you train PVT on ImageNet22k?

Thank you for your great work! As the title descripted, I want to know about your ImageNet22k results. I saw a checkpoint of PVT_v2_b5 on imagenet_22k in your release. Is that useful?

opened by Roger-Liang 0
Question about cls token

Hi author! thanks for your nice work.

I have a question about cls token in PVT.

In ViT and DeiT, cls token is appended at input embedding process. But PVT append cls token at input of last stage.

Why PVT doesn't append cls token at input embedding process?

Thanks.

opened by eremo2002 0
without Convolutions?

Paper offers convolution free architecture but implementation contains convolution, at pvt2 paper authors says spatial reduction done with conv but I could not see that in pvt1. Is there any other way to do that?

opened by Oguzhanercan 0
Question about pooling size

Hi @whai362

I was wondering why the pooling size is set to 7 for all stages? Have you tried a higher pooling size (e.g. more keys and values) for the initial stages while decreasing in later stages?

opened by magehrig 0

Releases(v3)

Owner

GitHub Repository

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

MAT: Mask-Aware Transformer for Large Hole Image Inpainting (CVPR2022, Oral) Wenbo Li, Zhe Lin, Kun Zhou, Lu Qi, Yi Wang, Jiaya Jia [Paper] News This

254 Dec 29, 2022

🥇 LG-AI-Challenge 2022 1위 솔루션 입니다.

LG-AI-Challenge-for-Plant-Classification Dacon에서 진행된 농업 환경 변화에 따른 작물 병해 진단 AI 경진대회 에 대한 코드입니다. (colab directory에 코드가 잘 정리 되어있습니다.) Requirements python

10 Jun 30, 2022

Official implementation for paper: Feature-Style Encoder for Style-Based GAN Inversion

Feature-Style Encoder for Style-Based GAN Inversion Official implementation for paper: Feature-Style Encoder for Style-Based GAN Inversion. Code will

63 Jan 03, 2023

这是一个yolo3-tf2的源码，可以用于训练自己的模型。

YOLOV3：You Only Look Once目标检测模型在Tensorflow2当中的实现目录性能情况 Performance 所需环境 Environment 文件下载 Download 训练步骤 How2train 预测步骤 How2predict 评估步骤 How2eval 参考资料

68 Dec 21, 2022

The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization [Paper] accepted at the EMNLP 2021: Vision Guided Genera

42 Jan 07, 2023

Chinese clinical named entity recognition using pre-trained BERT model

Chinese clinical named entity recognition (CNER) using pre-trained BERT model Introduction Code for paper Chinese clinical named entity recognition wi

109 Dec 14, 2022

Code release for Convolutional Two-Stream Network Fusion for Video Action Recognition

Convolutional Two-Stream Network Fusion for Video Action Recognition

676 Dec 31, 2022

Official MegEngine implementation of CREStereo(CVPR 2022 Oral).

[CVPR 2022] Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation This repository contains MegEngine implementation of ou

309 Dec 30, 2022

System-oriented IR evaluations are limited to rather abstract understandings of real user behavior

Validating Simulations of User Query Variants This repository contains the scripts of the experiments and evaluations, simulated queries, as well as t

2 Nov 23, 2022

A cool little repl-based simulation written in Python

A cool little repl-based simulation written in Python planned to integrate machine-learning into itself to have AI battle to the death before your eye

6 Sep 17, 2022

Augmenting Physical Models with Deep Networks for Complex Dynamics Forecasting

Official code of APHYNITY Augmenting Physical Models with Deep Networks for Complex Dynamics Forecasting (ICLR 2021, Oral) Yuan Yin*, Vincent Le Guen*

24 Oct 24, 2022

A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation

Paper Khoi Nguyen, Sinisa Todorovic "A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation", accepted to ICCV 2021 Our code is mai

5 Aug 14, 2022

3D-Reconstruction 基于深度学习方法的单目多视图三维重建

基于深度学习方法的单目多视图三维重建 Part I 三维重建代码：Part1 技术文档：[Markdown] [PDF] 原始图像：Original Images 点云结果：Point Cloud Results-1

19 Dec 26, 2022

Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or columns of a 2d feature map, as a standalone package for Pytorch

Triangle Multiplicative Module - Pytorch Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or c

22 Oct 28, 2022

Lite-HRNet: A Lightweight High-Resolution Network

LiteHRNet Benchmark 🔥 🔥 Based on MMsegmentation 🔥 🔥 Cityscapes FCN resize concat config mIoU last mAcc last eval last mIoU best mAcc best eval bes

16 Dec 12, 2022

Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order of magnitude using coresets and data selection.

COResets and Data Subset selection Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order

244 Jan 09, 2023