Visual-Attention-Network — MMPretrain 1.2.0 documentation (original) (raw)

Abstract¶

While originally designed for natural language processing (NLP) tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, the 2D nature of images brings three challenges for applying self-attention in computer vision. (1) Treating images as 1D sequences neglects their 2D structures. (2) The quadratic complexity is too expensive for high-resolution images. (3) It only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel large kernel attention (LKA) module to enable self-adaptive and long-range correlations in self-attention while avoiding the above issues. We further introduce a novel neural network based on LKA, namely Visual Attention Network (VAN). While extremely simple and efficient, VAN outperforms the state-of-the-art vision transformers and convolutional neural networks with a large margin in extensive experiments, including image classification, object detection, semantic segmentation, instance segmentation, etc.

How to use it?¶

from mmpretrain import inference_model

predict = inference_model('van-tiny_3rdparty_in1k', 'demo/bird.JPEG') print(predict['pred_class']) print(predict['pred_score'])

Models and results¶

Image Classification on ImageNet-1k¶

Model	Pretrain	Params (M)	Flops (G)	Top-1 (%)	Top-5 (%)	Config	Download
van-tiny_3rdparty_in1k*	From scratch	4.11	0.88	75.41	93.02	config	model
van-small_3rdparty_in1k*	From scratch	13.86	2.52	81.01	95.63	config	model
van-base_3rdparty_in1k*	From scratch	26.58	5.03	82.80	96.21	config	model
van-large_3rdparty_in1k*	From scratch	44.77	8.99	83.86	96.73	config	model

Models with * are converted from the official repo. The config files of these models are only for inference. We haven’t reproduce the training results.

Citation¶

@article{guo2022visual, title={Visual Attention Network}, author={Guo, Meng-Hao and Lu, Cheng-Ze and Liu, Zheng-Ning and Cheng, Ming-Ming and Hu, Shi-Min}, journal={arXiv preprint arXiv:2202.09741}, year={2022} }