基于混合方法及回归校验的汉维句子对齐  被引量:1

Chinese-Uyghur sentence alignment based on hybrid method and regression check

在线阅读下载全文

作  者:李斌 艾斯卡尔·艾木都拉[1] LI Bin;ASKAR Hamdulla(College of Information Science and Engineering,Xinjiang University,Urumqi 830046,China)

机构地区:[1]新疆大学信息科学与工程学院

出  处:《电视技术》2019年第13期1-5,共5页Video Engineering

基  金:国家自然科学基金项目(61562081)

摘  要:该文探讨了汉语与维吾尔语原始语料处理中切分句子与对齐句子的难点及解决方案,提出了一种用于汉维平行语料库对齐的混合方法及回归校验法。该文基于锚点结合词典的方法进行句子对齐,并基于长度模型用普通最小二乘法做线性回归分析,计算相关系数、确定阈值并拟合最佳拟合直线,自动校验排错,继而建立汉维双语平行语料库。实验表明,本文方法有效提高了句子对齐的正确率与召回率,提高了汉维平行语料库的构建效率。This paper discusses the difficulties and solutions of segmenting sentences and aligning sentences in the processing of Chinese and Uyghur original corpus, and proposes a hybrid method and regression check method for the alignment of Chinese and Uyghur parallel corpus. In this paper, sentence alignment is carried out based on anchor point combined with dictionary, linear regression analysis is done by using ordinary least square method based on length model, the correlation coefficient is calculated, the threshold value is determined, and the best-fit straight line is fitted, and errors are checked automatically, so as to establish Chinese-Uyghur bilingual parallel corpus. Experiments show that this method can effectively improve the accuracy and recall rate of sentence alignment, and improve the efficiency of constructing Chinese-Uyghur parallel corpus.

关 键 词:平行语料库 句子对齐 线性回归 翻译语料 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象