基于互信息和T检验值的白语数字化保护研究  

A digital protection method for the Bai language based on mutual information and T-score

在线阅读下载全文

作  者:施洪贞 李顺良 罗新磊 SHI Hong-zhen;LI Shun-liang;LUO Xin-lei(School of Electrical and Information Engineering,Yunnan Minzu University,Kunming 650500,China)

机构地区:[1]云南民族大学电气信息工程学院,云南昆明650500

出  处:《云南民族大学学报(自然科学版)》2020年第4期371-375,共5页Journal of Yunnan Minzu University:Natural Sciences Edition

基  金:云南省应用基础研究计划项目(2015FD033)

摘  要:基于赵衍荪、徐琳编著的《白汉词典》,尝试建立白语拼音语料库,利用计算语言学中的MI值和T检验值从关联性和置信度两个角度出发,并根据MI值和T值的互补性,提出了词项搭配的分类模型,用于计算中心词与搭配词的词项搭配可靠性.实验结果表明本方法能全面有效地定量分析出中心词的常见典型搭配和低频固定搭配,为白汉机器翻译打下基础,助力于白语保护和传承.As one of non-written minority languages, the Bai language is an important carrier of Bai culture. In recent years, due to the rapid development of tourism, economy and other factors, the chinesization of the Bai language has become more and more prominent. At present, the local government and scholars have been dedicated to protecting the Bai language, and digital voice protection is particularly valued. However, few researchers focus on the digital text protection based on pinyin scheme. Based on the Bai-Han Dictionary compiled by Zhao Yansun and Xu Lin, this paper attempts to build a Bai language pinyin corpus. According to the correlation of MI and the confidence of T-score in computational linguistics, and using the complementarity of MI and T-score, a classification model is proposed to calculate the collocation reliability of the headwords and collocable words. The experimental results show that the method in this paper can effectively calculate common typical collocations and low-frequency fixed collocations of headwords. This is the basis of the Bai-Chinese machine translation, and will also help protect and inherit the Bai language.

关 键 词:语料库 词项搭配 互信息 T检验值 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象