Emotion-Aware Music Driven Movie Montage

作　　者：刘伍琴林敏轩黄海斌马重阳宋玉董未名徐常胜 Wu-Qin Liu;Min-Xuan Lin;Hai-Bin Huang;Chong-Yang Ma;Yu Song;Wei-Ming Dong;Chang-Sheng Xu(School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 101408,China;The State Key Laboratory of Multimodal Artificial Intelligence System(MAIS),Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China;Kuaishou Technology,Beijing 100085,China;School of Mechanical Engineering,University of Science and Technology Beijing,Beijing 100083,China)

机构地区：[1]School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 101408,China [2]The State Key Laboratory of Multimodal Artificial Intelligence System(MAIS),Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China [3]Kuaishou Technology,Beijing 100085,China [4]School of Mechanical Engineering,University of Science and Technology Beijing,Beijing 100083,China

出　　处：《Journal of Computer Science & Technology》2023年第3期540-553,共14页计算机科学技术学报（英文版）

基　　金：supported by the National Key Research and Development Program of China under Grant No.2020AAA0106200 and the National Natural Science Foundation of China under Grant No.61832016.

摘　　要：In this paper, we present Emotion-Aware Music Driven Movie Montage, a novel paradigm for the challenging task of generating movie montages. Specifically, given a movie and a piece of music as the guidance, our method aims to generate a montage out of the movie that is emotionally consistent with the music. Unlike previous work such as video summarization, this task requires not only video content understanding, but also emotion analysis of both the input movie and music. To this end, we propose a two-stage framework, including a learning-based module for the prediction of emotion similarity and an optimization-based module for the selection and composition of candidate movie shots. The core of our method is to align and estimate emotional similarity between music clips and movie shots in a multi-modal latent space via contrastive learning. Subsequently, the montage generation is modeled as a joint optimization of emotion similarity and additional constraints such as scene-level story completeness and shot-level rhythm synchronization. We conduct both qualitative and quantitative evaluations to demonstrate that our method can generate emotionally consistent montages and outperforms alternative baselines.

关键词：movie montage emotion analysis audio-visual modality contrastive learning

分类号：J932[艺术—电影电视艺术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Emotion-Aware Music Driven Movie Montage

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Emotion-Aware Music Driven Movie Montage

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索