连续话语中的基本语言运行单元SE——来自藏语拉萨话连读变调的实验证据  

Sense Element in Continuous Speech:Evidence from Lhasa Tibetan Speech Synthesis

在线阅读下载全文

作  者:祖漪清 陆晨 欧珠 朱荣华 刘晨宁 邵鹏飞 录布塔 张校 胡国平 ZU Yiqing;LU Chen;Ngodrup;ZHU Ronghua;LIU Chenning;SHAO Pengfei;Klu’Bum Thr;ZHANG Xiao;HU Guoping(iFLYTEK Corporation;Interdisciplinary Research Center for Linguistic Science,University of Science and Technology of China;不详)

机构地区:[1]科大讯飞股份有限公司 [2]中国科学技术大学语言科学交叉研究中心 [3]暨南大学文学院 [4]西藏民族大学

出  处:《当代语言学》2022年第4期515-532,共18页Contemporary Linguistics

基  金:2008年工业和信息化部电子信息产业发展基金管理办公室“以藏语为主的少数民族语音平台软件研发及产业化”项目(财建[2008]329号;工信部[2008]97号);“智能语音技术及产品研发与产品化:面向少数民族语言的智能语音技术及系统研发”项目(财建[2014]513号;工信部[2014]425号)。

摘  要:在有连读变调的声调语言中,连读变调将一些语义颗粒组合成一个整体,明确显示了语音的语法形态。连读变调和单字调共同构成了具有连读变调特性语言的声调系统。单字域和连调域反映了语言表达和语言理解这两个实际语言运用过程中,音节尺寸不同但层级地位相同的语言单元,我们将其称为连续话语中的基本语言运行单元SE(sense element)。语句表达的语义概念是多维度的,而声音只能在时间维度上线性展开。在连续话语中,一个个SE接续而出,所以导致SE的内部语法结构虽存在差异,仍处于同一个工作级别。拉萨话序列到序列的语音合成实验表明,使用SE作为输入单元获得了MOS 4.25的得分,高出使用传统词典词输入系统0.82分。实验证明SE是合理落实拉萨话声调音位辨义功能的基本单位。为了在文本上划分SE,我们在拉萨话语音数据库上进行了以下标注:1)声调标注:声调调值和连调域;2)语法标注:分词、词类标注,以及句法功能词的词义或功能标注。声调和语法的对齐结果表明:同形单字在句子中担任不同角色时,会通过单字调、单字变调、失去声调三种声调模式来表达其语法功能对立;同形两音节则通过连调、不连调、单字调加失去声调这三种声调形式表达两字间的语义关系和语法结构的对立。对输入文本进行SE甚至更大结构的划分,才能使文本和语音形成对应,正确实现语音合成。语音合成不仅是一个应用系统,同时也为实验语言学提供了一个新的研究平台。In tonal languages,semantic elements can be combined into a unit characterized by tone sandhi.A multisyllabic unit with tone sandhi has the same status as a monosyllabic element.Such a speech unit is termed as the sense element,or SE.Since the semantic concepts expressed by the utterances are multidimensional,and the sounds unfold linearly in the temporal dimension,that is why the grammatical structure of SEs themselves,though different,are at the same working level.SE,as a running element in the process of language production and comprehension,is uttered out one after another in speech.MOS(mean opinion score)rating is conducted to compare the synthesized Lhasa Tibetan speech generated by two models:the dictionary-entry model and the SE-entry model.The result shows that the model using SE as input unit obtains an MOS of 4.25,which is 0.82 point higher than the traditional one using dictionary entries as input.A Lhasa Tibetan speech database which includes 2,475 sentences,3.95 hours in total,is used in this experiment.The database is annotated on two levels:i)tonal annotation,which includes the annotation of tonal value and tone sandhi domain,and ii)grammatical annotation,which includes word segmentation,POS tagging and the annotation of function words.The resulting alignment of tonal and grammatical annotations demonstrates that among 25,265 word boundaries,there is a 15%inconsistency between word boundaries and tone sandhi boundaries:the 85%consistency is found in content words,such as nouns and adjectives,while inconsistency only occurs in verb phrases.For example,verb phrases like“negation adverb+verb”and“verb+topic marker”are grouped into a tone sandhi domain.It is suggested that when homographic characters play different roles in sentences,such a difference is exhibited through varying forms of tonal realization.Specifically in Lhasa Tibetan,there are three tonal patterns for homographic disyllables(tone sandhi,citation tone plus citation tone,and citation tone plus tone loss),which express various s

关 键 词:连调域 连读变调语言 声调系统 基本语言运行单元 语音合成 

分 类 号:H214[语言文字—少数民族语言]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象