PVT v2:Improved baselines with Pyramid Vision Transformer  被引量:83

在线阅读下载全文

作  者:Wenhai Wang Enze Xie Xiang Li Deng-Ping Fan Kaitao Song Ding Liang Tong Lu Ping Luo Ling Shao 

机构地区:[1]Shanghai AI Laboratory,Shanghai 200232,China [2]Department of Computer Science and Technology,NanjingUniversity,Nanjing 210023,China [3]Department of Computer Science,the University ofHong Kong,Hong Kong 999077,China [4]School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210014,China [5]Computer Vision Lab,ETH Zurich,Zurich 8092,Switzerland [6]SenseTime,Beijing 100080,China [7]Inception Institute of Artificial Intelligence,Abu Dhabi,United Arab Emirates

出  处:《Computational Visual Media》2022年第3期415-424,共10页计算可视媒体(英文版)

基  金:National Natural Science Foundation of China under Grant Nos.61672273 and 61832008;Science Foundation for Distinguished Young Scholars of Jiangsu under Grant No.BK20160021;Postdoctoral Innovative Talent Support Program of China under Grant Nos.BX20200168,2020M681608;General Research Fund of Hong Kong under Grant No.27208720。

摘  要:Transformers have recently lead to encouraging progress in computer vision.In this work,we present new baselines by improving the original Pyramid Vision Transformer(PVT v1)by adding three designs:(i)a linear complexity attention layer,(ii)an overlapping patch embedding,and(iii)a convolutional feed-forward network.With these modifications,PVT v2 reduces the computational complexity of PVT v1 to linearity and provides significant improvements on fundamental vision tasks such as classification,detection,and segmentation.In particular,PVT v2 achieves comparable or better performance than recent work such as the Swin transformer.We hope this work will facilitate state-ofthe-art transformer research in computer vision.Code is available at https://github.com/whai362/PVT.

关 键 词:TRANSFORMERS dense prediction image classification object detection semantic segmentation 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象