检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:曾芸芸 张红英[1,2] 袁明东 ZENG Yunyun;ZHANG Hongying;YUAN Mingdong(School of Information Engineering,Southwest University of Science and Technology,Mianyang,Sichuan 621010,China;Robot Technology Used for Special Environment Key Laboratory of Sichuan Provincial,Southwest University of Science and Technology,Mianyang,Sichuan 621010,China)
机构地区:[1]西南科技大学信息工程学院,四川绵阳621010 [2]西南科技大学特殊环境机器人技术四川省重点实验室,四川绵阳621010
出 处:《计算机工程与应用》2024年第20期224-232,共9页Computer Engineering and Applications
基 金:国家自然科学基金(61872304)。
摘 要:人群计数在公共安全管理、公共空间设计以及其他视觉任务如行为分析、拥塞分析等方面具有重要的应用。然而复杂的背景和人头尺度大小不一导致人群计数的效果并不理想。针对静态图像中尺度变化和背景干扰问题,提出了一种基于双分支中间特征提取的人群计数网络——DBFE_MFNet。该网络沿用编码-解码器结构,在编码阶段使用VGG19卷积神经网络的前16层,为了更好融合多尺度信息,将VGG19卷积神经网络的前16层的后4层卷积替换成空洞率为2的膨胀卷积,解码部分采用抑制背景干扰的残差卷积注意力模块(residual convolutional attention module,RCAM),在编码-解码器结构中间插入双分支中间特征提取模块(dual branch intermediate feature extraction module,DBFE),分支1采用金字塔结构并融合位置注意力模块提取多尺度上下文信息,分支2沿用金字塔结构融合双通道注意力机制使模型关注不同大小人头信息,最后使用1×1卷积生成密度图。实验方面,在ShanghaiTech PartA、ShanghaiTech PartB、Mall数据集上进行了算法对比实验,DBFE_MFNet模型在上述数据集的平均绝对误差和均方根误差分别为63.2、7.1、1.80和99.2、11.8、2.28,经对比实验分析,DBFE_MFNet模型具有不错的计数性能和稳定性能;在ShanghaiTech PartB进行了消融实验,实验验证了模型各模块的有效性。Crowd counting has important applications in public safety management,public space design,and other visual tasks such as behavior analysis and congestion analysis.However,the complexity of the background and the varying size of the head scale result in unsatisfactory crowd counting performance.To address the issues of scale changes and background interference in static images,a crowd counting network based on dual branch intermediate feature extraction is proposed.The network follows the encoder decoder structure and uses the first 16 layers of VGG19 convolutional neural network in the encoding stage.In order to better fuse multi-scale information,it replaces the last 4 convolutions of the first 16 layers of the VGG19 convolutional neural network with dilated convolutions with a vacancy rate of 2.The decoding part uses a residual convolutional attention module(RCAM)to suppress background interference,and inserts a dual branch intermediate feature extraction module(DBFE)in the middle of the encoder decoder structure.Branch 1 adopts a pyramid structure and integrates the position attention module to extract multi-scale contextual information,branch 2 follows a pyramid structure and integrates a dual channel attention mechanism to focus the model on different sizes of head information,and finally uses 1×1 generate density maps through convolution.In terms of experiments,algorithm comparison experiments are carried out on the data sets of ShanghaiTech PartA,ShanghaiTech PartB and Mall.The average absolute error and root mean square error of the model in the above data sets are 63.2,7.1,1.80 and 99.2,11.8,2.28,respectively.Through comparative experimental analysis,the model has good counting performance and stability.Ablation experiments are conducted on ShanghaiTech PartB,which verifies the effectiveness of each module of the model.
关 键 词:人群计数 VGG19 编码-解码器 残差卷积注意力模块 双分支中间特征提取模块
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.119.102.182