Hybrid Semantic Concept Temporal Pooling for Large-Scale Video Event Analysis  

Hybrid Semantic Concept Temporal Pooling for Large-Scale Video Event Analysis

在线阅读下载全文

作  者:LIU Wu MA Huadong 

机构地区:[1]Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia,Beijing University of Posts and Telecommunications,Beijing 100876,China

出  处:《Chinese Journal of Electronics》2017年第6期1125-1131,共7页电子学报(英文版)

基  金:supported by the National Natural Science Foundation of China(No.61602049);the National Key Research and Development Plan(No.2016YFC0801005);the Funds for Creative Research Groups of China(No.61421061);the Beijing Training Project for the Leading Talents in S&T(No.ljrc 201502);the CCF-Tencent Open Research Fund(No.AGR20160113)

摘  要:To solve the task of detecting and recounting events in videos with limited training examples, we propose a novel two-stage hybrid concept temporal pooling approach that is aware of potential concept drift in the video stream. We initially partition videos into temporal pyramids consisting of keyframes. Semantic concepts in keyframes is detected, which enables us to derive aggregated detection scores for each temporal pyramid using average-pooling and ultimately for the entire video via max-pooling. Owing to this refined hybrid pooling, our method yields more discriminative semantic representations with respect to the event query. We also develop an effective filtering strategy to cope with noisy concept detectors to robustify the textual description generation in recounting. Experiments on the large scale TRECVID MEDTest dataset demonstrate our method improves the accuracies over state-of-the-art methods, both for event detection and recounting.To solve the task of detecting and recounting events in videos with limited training examples, we propose a novel two-stage hybrid concept temporal pooling approach that is aware of potential concept drift in the video stream. We initially partition videos into temporal pyramids consisting of keyframes. Semantic concepts in keyframes is detected, which enables us to derive aggregated detection scores for each temporal pyramid using average-pooling and ultimately for the entire video via max-pooling. Owing to this refined hybrid pooling, our method yields more discriminative semantic representations with respect to the event query. We also develop an effective filtering strategy to cope with noisy concept detectors to robustify the textual description generation in recounting. Experiments on the large scale TRECVID MEDTest dataset demonstrate our method improves the accuracies over state-of-the-art methods, both for event detection and recounting.

关 键 词:Event detection Event recounting Semantic representation Hybrid temporal pooling Concept filtering 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象