Dance2MIDI:Dance-driven multi-instrument music generation  

在线阅读下载全文

作  者:Bo Han Yuheng Li Yixuan Shen Yi Ren Feilin Han 

机构地区:[1]College of Computer Science and Technology,Zhejiang University,Hangzhou 310058,China [2]National University of Singapore,Singapore 119077,Singapore [3]Speech&Audio Team,Bytedance AI Lab,Singapore 048583,Singapore [4]Department of Film and TV Technology,Beijing Film Academy,Beijing 100088,China

出  处:《Computational Visual Media》2024年第4期791-802,共12页计算可视媒体(英文版)

基  金:supported by the National Social Science Foundation Art Project(No.20BC040);China Scholarship Council(CSC)Grant(No.202306320525).

摘  要:Dance-driven music generation aims to generate musical pieces conditioned on dance videos.Previous works focus on monophonic or raw audio generation,while the multi-instrument scenario is under-explored.The challenges associated with dancedriven multi-instrument music(MIDI)generation are twofold:(i)lack of a publicly available multi-instrument MIDI and video paired dataset and(ii)the weak correlation between music and video.To tackle these challenges,we have built the first multi-instrument MIDI and dance paired dataset(D2MIDI).Based on this dataset,we introduce a multi-instrument MIDI generation framework(Dance2MIDI)conditioned on dance video.Firstly,to capture the relationship between dance and music,we employ a graph convolutional network to encode the dance motion.This allows us to extract features related to dance movement and dance style.Secondly,to generate a harmonious rhythm,we utilize a transformer model to decode the drum track sequence,leveraging a cross-attention mechanism.Thirdly,we model the task of generating the remaining tracks based on the drum track as a sequence understanding and completion task.A BERTlike model is employed to comprehend the context of the entire music piece through self-supervised learning.We evaluate the music generated by our framework trained on the D2MIDI dataset and demonstrate that our method achieves state-of-the-art performance.

关 键 词:video understanding music generation symbolic music cross-modal learning self-supervision 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象