检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Renyuan Liu Jian Yang Xiaobing Zhou Xiaoguang Yue
机构地区:[1]School of Information Science and Engineering,Yunnan University,Kunming,650500,China [2]Rattanakosin International College of Creative Entrepreneurship,Rajamangala University of Technology Rattanakosin,Nakhon Pathom,73170,Thailand [3]Department of Computer Science and Engineering,School of Sciences,European University Cyprus,Nicosia,1516,Cyprus [4]CIICESI,ESTG,Politécnico do Porto,Felgueiras,4610-156,Portugal
出 处:《Computer Modeling in Engineering & Sciences》2023年第8期1259-1276,共18页工程与科学中的计算机建模(英文)
基 金:supported by National Key R&D Program of China (2020AAA0107901).
摘 要:Latent information is difficult to get from the text in speech synthesis.Studies show that features from speech can get more information to help text encoding.In the field of speech encoding,a lot of work has been conducted on two aspects.The first aspect is to encode speech frame by frame.The second aspect is to encode the whole speech to a vector.But the scale in these aspects is fixed.So,encoding speech with an adjustable scale for more latent information is worthy of investigation.But current alignment approaches only support frame-by-frame encoding and speech-to-vector encoding.It remains a challenge to propose a new alignment approach to support adjustable scale speech encoding.This paper presents the dynamic speech encoder with a new alignment approach in conjunction with frame-by-frame encoding and speech-to-vector encoding.The speech feature fromourmodel achieves three functions.First,the speech feature can reconstruct the origin speech while the length of the speech feature is equal to the text length.Second,our model can get text embedding fromspeech,and the encoded speech feature is similar to the text embedding result.Finally,it can transfer the style of synthesis speech and make it more similar to the given reference speech.
关 键 词:Speech synthesis dynamic framing convolution network speech encoding
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7