Audio-guided self-supervised learning for disentangled visual speech representations  

在线阅读下载全文

作  者:Dalu FENG Shuang YANG Shiguang SHAN Xilin CHEN 

机构地区:[1]Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China [2]University of Chinese Academy of Sciences,Beijing 100049,China

出  处:《Frontiers of Computer Science》2024年第6期277-279,共3页计算机科学前沿(英文版)

基  金:the National Natural Science Foundation of China(Grant Nos.62276247,62076250)。

摘  要:1 Introduction.Learning visual speech representations from talking face videos is an important problem for several speech-related tasks,such as lip reading,talking face generation,and audiovisual speech separation[1,2].The key difficulty lies in tackling speech-irrelevant factors presented in the videos,such as lighting,resolution,viewpoints,and head motion.

关 键 词:SUCH DIS VISUAL 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术] TN912.3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象