基于场景表示中对象特征语法分析的视频描述  被引量:1

Video captioning based on scene representation object features syntax analysis

在线阅读下载全文

作  者:付燕[1] 王咪咪 叶鸥 FU Yan;WANG Mi-mi;YE Ou(College of Computer Science and Technology,Xi’an University of Science and Technology,Xi’an 710054,China)

机构地区:[1]西安科技大学计算机科学与技术学院,陕西西安710054

出  处:《计算机工程与设计》2023年第2期488-493,共6页Computer Engineering and Design

基  金:陕西省自然科学基金项目(2018JQ5095);中国博士后科学基金项目(2020M673446)。

摘  要:为解决基于编码器-解码器的视频描述方法中存在忽略特征语法分析,造成描述语句语法结构不清晰的问题,提出一种基于场景表示中对象特征语法分析的视频描述方法。编码阶段将视频的2D、C3D特征、对象特征和自注意力机制相结合,构建视觉场景表示模型,描述视觉特征间的依赖关系;构建视觉对象特征语法分析模型,分析对象特征在描述语句中的语法成分;解码阶段结合语法分析结果和LSTM网络模型,输出视频描述语句。所提方法在MSVD和MSR-VTT数据集进行实验,结果表明,该方法在不同评价指标方面性能较好,视频描述语句的语法结构清晰。To solve the problem of ignoring feature syntax analysis of video description method based on encoder-decoder, resulting in unclear description syntax structure, a video description method based on object feature grammar analysis in scene representation was proposed. In the coding stage, a visual scene representation model was constructed by combining 2D and C3D features of the video, as well as object features and self-attention mechanism to describe the dependence between visual features. A visual object feature grammar analysis model was constructed to analyze the grammatical components of object features in description sentences. The decoding stage combined the results of grammar analysis and the LSTM network model to output the video captioning. Experimental results on MSVD and MSR-VTT data sets show that the proposed method has good performance in different evaluation indexes, and the syntax structure of video description sentence is clear.

关 键 词:视频描述 编码器-解码器模型 特征提取 自注意力机制 对象特征 视觉场景表示 语法分析 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象