检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:拉巴顿珠 扎西多吉 珠杰 LHAKPA Dondrub;ZHAXI Duoji;ZHU Jie(School of Information Science and Technology,Tibet University,Lhasa 850000,China;Tibet Informatization Collaborative Innovation Center Jointly Built by the Province and the Ministry,Lhasa 850000,China)
机构地区:[1]西藏大学信息科学技术学院,拉萨850000 [2]西藏信息化省部共建协同创新中心,拉萨850000
出 处:《吉林大学学报(工学版)》2024年第12期3577-3588,共12页Journal of Jilin University:Engineering and Technology Edition
基 金:国家自然基金项目(62406256);教育部人文社会科学研究项目(21YJCZH059);2025年西藏自治区自然科学基金项目(ZRKX2025000068);西藏大学在职攻读博士学位及博士后进站研究人员科研项目(zbds202326);西藏大学培育计划项目(ZDQMJH20-09)。
摘 要:针对现代藏语文本表征形式复杂多样且不规范,影响语音合成系统的性能问题,提出了具有易于维护及可扩展性特点的藏语文本标准化方法。首先,对藏文标记符号和来自其他语言的非藏文特殊符号在藏语文本中的不同表现形式进行了深度解析,并通过不同特征对特殊符号进行了分类;其次,根据归纳的不同类型,分别建立起了15种特殊符号转化为藏语的书写规则;最后,以13490个句子作为实验数据,通过藏语字音转换测试识别并检测文本中特殊符号和藏文音节的有效性,采用规则匹配的方法对含有特殊符号的句子进行标准化处理。实验结果表明:标准化之前藏语音素转写的遗漏率高达4.69%,而经过标准化之后音素转写的遗漏率降低到0.01%,其藏语文本标准化准确率达99%。In view of the complexity and nonstandard representation of modern Tibetan text,which affects the performance of speech synthesis system,this paper proposes a Tibetan text standardization method with the characteristics of easy maintenance and scalability.Firstly,a deep analysis was conducted on the different manifestations of Tibetan marker symbols and non Tibetan special symbols from other languages in Tibetan texts,and the special symbols were classified based on different features.Secondly,according to the different types of induction,the writing rules for converting 15 special symbols into Tibetan language were respectively established.Finally,using 13490 sentences as the experimental data,the effectiveness of special symbols and Tibetan syllables in the text is identified and tested through the Tibetan graphemeto-phoneme conversion test,and the sentences containing special symbols are standardized by the method of rule matching.The experimental results show that the omission rate of Tibetan phoneme transcription before standardization was as high as 4.69%,but after standardization,the omission rate of phoneme transcription was reduced to 0.01%,and the standardization accuracy rate of Tibetan text reached 99%.
关 键 词:计算机应用技术 藏语文本分析 文本标准化 语音合成 特殊符号 字音转换
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49