检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Qiongyi Zhou Changde Du Huiguang He
机构地区:[1]Research Center for Brain-inspired Intelligence and National Laboratory of Pattern Recognition,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China [2]School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100190,China [3]Center for Excellence in Brain Science and Intelligence Technology,Chinese Academy of Sciences,Beijing 100190,China
出 处:《Machine Intelligence Research》2022年第5期439-455,共17页机器智能研究(英文版)
基 金:supported by National Natural Science Foundation of China(Nos.61976209 and 62020106015);the CAS International Collaboration Key Project,China(No.173211KYSB20190024);the Strategic Priority Research Program of CAS,China(No.XDB32040000)。
摘 要:Nowadays,deep neural networks(DNNs)have been equipped with powerful representation capabilities.The deep convolutional neural networks(CNNs)that draw inspiration from the visual processing mechanism of the primate early visual cortex have outperformed humans on object categorization and have been found to possess many brain-like properties.Recently,vision transformers(ViTs)have been striking paradigms of DNNs and have achieved remarkable improvements on many vision tasks compared to CNNs.It is natural to ask how the brain-like properties of ViTs are.Beyond the model paradigm,we are also interested in the effects of factors,such as model size,multimodality,and temporality,on the ability of networks to model the human visual pathway,especially when considering that existing research has been limited to CNNs.In this paper,we systematically evaluate the brain-like properties of 30 kinds of computer vision models varying from CNNs and ViTs to their hybrids from the perspective of explaining brain activities of the human visual cortex triggered by dynamic stimuli.Experiments on two neural datasets demonstrate that neither CNN nor transformer is the optimal model paradigm for modelling the human visual pathway.ViTs reveal hierarchical correspondences to the visual pathway as CNNs do.Moreover,we find that multi-modal and temporal networks can better explain the neural activities of large parts of the visual cortex,whereas a larger model size is not a sufficient condition for bridging the gap between human vision and artificial networks.Our study sheds light on the design principles for more brain-like networks.The code is available at https://github.com/QYiZhou/LWNeuralEncoding.
关 键 词:Convolutional neural network(CNN) vision transformer(Vi T) multi-modal networks spatial-temporal networks visual neural encoding brain-like neural networks
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术] TP183[自动化与计算机技术—计算机科学与技术] R318[医药卫生—生物医学工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249