检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:梅锋[1] 周娟平 陆璐[2] MEI Feng;ZHOU Juanping;LU Lu(Zhongshan Branch of Guangdong Broadcast&Video Network Co.,Ltd.,Zhongshan,Guangdong 528403,China;School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,China)
机构地区:[1]广东省广播电视网络股份有限公司中山分公司,广东中山528403 [2]华南理工大学计算机科学与工程学院,广州510006
出 处:《计算机工程与应用》2021年第11期211-218,共8页Computer Engineering and Applications
基 金:国家自然科学基金(61370103);广州市产学研重大项目(201902920004)。
摘 要:技术的目的是在缩短视频长度的同时,概括视频的主要内容,这样可以极大地节省人们浏览视频的时间。视频摘要技术的一个关键步骤是评估生成摘要的性能,现有的大多数方法是基于整个视频进行评估。然而,基于整个视频序列进行评估的计算成本很高,特别是对于长视频。而且在整个视频上评估生成摘要往往忽略了视频数据固有的时序关系,导致生成摘要缺乏故事情节的逻辑性。因此,提出了一个关注局部信息的视频摘要网络,称为自注意力和局部奖励视频摘要网络(ALRSN)。确切地说,该模型采用自注意力机制预测视频帧的重要性分数,然后通过重要性分数生成视频摘要。为了评估生成摘要的性能,进一步设计了一个局部奖励函数,同时考虑了视频摘要的局部多样性和局部代表性。该函数将生成摘要映射回原视频,并在局部范围内评估摘要的性能,使其具有原视频的时序结构。通过在局部范围内获得更高的奖励分数,使模型生成更多样化、更具代表性的视频摘要。综合实验表明,在两个基准数据集SumMe和TvSum上,ALRSN模型优于现有方法。Video summarization aims to shorten the length of the video while preserving the main content,eminently saving time of browsing videos.A key step of video summarization is to evaluate the performance of generated summaries,whereas most existing methods focus on evaluating it based on the whole video.However,evaluation based on the entire video sequence is computationally expensive,especially for long videos.Moreover,the evaluation of the generated summary on the entire video often ignores the inherent temporal relationship of the video data,which leads to the lack of logic of the storyline.It thereby proposes a novel framework for video summarization called Attentive Local Reward Summarization Network(ALRSN).To be precise,the model performs frame-level important score predictions through a self-attention mechanism.To evaluate the performance of generated summaries,it further designs a local reward function that jointly accounts for both the local diversity and local representativeness.The generated summary maps to the original video and evaluates the performance in a local scope,therefore it has the temporal relationship.In addition,the local reward function encourages the model to produce a more diverse and representative summary in the local scope,thereby obtaining a higher reward.The comprehensive experiment on two benchmark datasets,SumMe and TvSum,shows that the ALRSN model is superior to the state-of-the-art methods.
分 类 号:TP302.7[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249