融合时空切片和双注意力机制的视频摘要方法  被引量:1

Video Summarization Method Based on Spatiotemporal Slice and Dual Attention Mechanism

在线阅读下载全文

作  者:张云佐 郭亚宁 李文博 ZHANG Yunzuo;GUO Yaning;LI Wenbo(School of Information Science and Technology,Shijiazhuang Tiedao University,Shijiazhuang 050043,China)

机构地区:[1]石家庄铁道大学信息科学与技术学院,石家庄050043

出  处:《西安交通大学学报》2022年第12期127-135,共9页Journal of Xi'an Jiaotong University

基  金:中央引导地方科技发展资金资助项目(226Z0501G);河北省自然科学基金资助项目(F2022210007);河北省高等学校科学技术研究项目(ZD2022100)。

摘  要:为解决现有视频摘要方法的视频帧特征信息提取不充分、摘要结果过分依赖单一特征的问题,提出了一种融合时空切片和双注意力机制的视频摘要方法。在原视频的精准分段阶段,提出了基于时空切片的核时序分割算法(STS-KTS),将视频场景信息反映为时空切片纹理信息,采用水平映射法将预处理后的时空切片投影为一维数组,作为KTS的输入特征;以双注意力机制和分组卷积为基本组件,结合BiLSTM构建时空特征提取网络,以快速提取丰富的时空特征信息,从而配合纹理特征信息消除现有摘要模型对单一特征的过分依赖;采用帧参数预测模块获取最佳的视频帧贡献度分数、中心度分数以及帧序列位置;将帧分数转化为镜头分数,以选取内容丰富的片段,进而生成动态视频摘要。在SumMe和TVSum数据集上的实验表明:所提方法能提高生成摘要的准确性,比现有方法性能更高,尤其在SumMe数据集上的生成摘要准确性相比于现有方法提升了0.58%。In order to solve the problems of insufficient extraction of video frame feature information and excessive dependence on a single feature in the existing summarization methods,a video summarization method based on spatiotemporal slice and dual attention mechanism is proposed.Firstly,a kernel temporal segmentation algorithm based on the spatiotemporal slice(STS-KTS)is proposed for the accurate segmentation of the original video.The algorithm reflects the video scene information as spatiotemporal slice texture information,and uses the horizontal mapping method to project the preprocessed spatiotemporal slice into a one-dimensional array as the input feature of KTS.At the same time,taking the dual attention mechanism and grouping convolution as the basic components,combined with BiLSTM,the spatiotemporal feature extraction network is constructed to quickly extract rich spatiotemporal feature information,so as to eliminate the excessive dependence of the existing summarization model on a single feature with the texture feature information.Then,the frame parameter prediction module is used to obtain the best video frame contribution score,center score and frame sequence position.Finally,the frame scores are converted into shot scores to select content-rich segments and generate dynamic video summarization.Experimental results on SumMe and TVSum datasets demonstrate that the proposed method can improve the accuracy of generating summarization and achieve better performance than the existing methods,in particular the accuracy of generating summarization on SumMe dataset is improved by 0.58%compared to the existing methods.

关 键 词:视频摘要 时空切片 双注意力机制 时空特征提取 深度学习 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象