机构地区:[1]北京林业大学信息学院,北京100083 [2]北京电影学院数字媒体学院,北京100088
出 处:《中国图象图形学报》2018年第5期748-755,共8页Journal of Image and Graphics
基 金:国家自然科学基金项目(61703046;31770589);中央高校基本科研业务费专项基金项目(2015ZCQ-XX)~~
摘 要:目的 视频精彩片段提取是视频内容标注、基于内容的视频检索等领域的热点研究问题。视频精彩片段提取主要根据视频底层特征进行精彩片段的提取,忽略了用户兴趣对于提取结果的影响,导致提取结果可能与用户期望不相符。另一方面,基于用户兴趣的语义建模需要大量的标注视频训练样本才能获得较为鲁棒的语义分类器,而对于大量训练样本的标注费时费力。考虑到互联网中包含内容丰富且易于获取的图像,将互联网图像中的知识迁移到视频片段的语义模型中可以减少大量的视频数据标注工作。因此,提出利用互联网图像的用户兴趣的视频精彩片段提取框架,方法 利用大量互联网图像对用户兴趣语义进行建模,考虑到从互联网中获取的知识变化多样且有噪声,如果不加选择盲目地使用会影响视频片段提取效果,因此,将图像根据语义近似性进行分组,将语义相似但使用不同关键词检索得到的图像称为近义图像组。在此基础上,提出使用近义语义联合组权重模型权衡,根据图像组与视频的语义相关性为不同图像组分配不同的权重。首先,根据用户兴趣从互联网图像搜索引擎中检索与该兴趣语义相关的图像集,作为用户兴趣精彩片段提取的知识来源;然后,通过对近义语义图像组的联合组权重学习,将图像中习得的知识迁移到视频中;最后,使用图像集中习得的语义模型对待提取片段进行精彩片段提取。结果 本文使用CCV数据库中的视频对本文提出的方法进行验证,同时与多种已有的视频关键帧提取算法进行比较,实验结果显示本文算法的平均准确率达到46.54,较其他算法相比提高了21.6%,同时算法耗时并无增加。此外,为探究优化过程中不同平衡参数对最终结果的影响,进一步验证本文方法的有效性,本文在实验过程中通过移除算法中的正则项来验证每一项对于Objective Video highlight extraction is of interest in video summary,organization,browsing,and indexing.Current research mainly focuses on extraction by optimizing the low-level feature diversity or representativeness of video frames,ignoring the interests of users,which leads to extraction results that are inconsistent with the expectation of users.However,collecting a large number of required labeled videos to model different user interest concepts for different videos is time consuming and labor intensive. Method We propose to learn models for user interest concepts on different videos by leveraging numerous Web images that which cover many roughly annotated concepts and are often captured in a maximally informative manner to alleviate the labeling process. However,knowledge from the Web is noisy and diverse such that brute force knowledge transfer may adversely affect the highlight extraction performance. In this study,we propose a novel useroriented keyframe extraction framework for online videos by leveraging a large number of Web images queried by synonyms from image search engines. Our work is based on the observation that users may have different interests in different frames when browsing the same video. By using user interest-related words as keywords,we can easily collect weakly labeled image data for interest concept model training. Given that different users may have different descriptions of the same interest concept,we denote different descriptions with similar semantic meanings as synonyms. When querying images from the Web,we use synonyms as keywords to avoid semantic one-sidedness. An image set returned by a synonym is considered a synonym group. Different synonym groups are weighted according to their relevance to the video frames. Moreover,the group weights and classifiers are simultaneously learned by a joint synonym group optimization problem to make them mutually beneficial and reciprocal. We also exploit the unlabeled online videos to optimize the group weights and classifiers for buildin
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...