检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李斌 艾斯卡尔·艾木都拉[1] LI Bin;ASKAR Hamdulla(College of Information Science and Engineering,Xinjiang University,Urumqi 830046,China)
机构地区:[1]新疆大学信息科学与工程学院
出 处:《电视技术》2019年第13期1-5,共5页Video Engineering
基 金:国家自然科学基金项目(61562081)
摘 要:该文探讨了汉语与维吾尔语原始语料处理中切分句子与对齐句子的难点及解决方案,提出了一种用于汉维平行语料库对齐的混合方法及回归校验法。该文基于锚点结合词典的方法进行句子对齐,并基于长度模型用普通最小二乘法做线性回归分析,计算相关系数、确定阈值并拟合最佳拟合直线,自动校验排错,继而建立汉维双语平行语料库。实验表明,本文方法有效提高了句子对齐的正确率与召回率,提高了汉维平行语料库的构建效率。This paper discusses the difficulties and solutions of segmenting sentences and aligning sentences in the processing of Chinese and Uyghur original corpus, and proposes a hybrid method and regression check method for the alignment of Chinese and Uyghur parallel corpus. In this paper, sentence alignment is carried out based on anchor point combined with dictionary, linear regression analysis is done by using ordinary least square method based on length model, the correlation coefficient is calculated, the threshold value is determined, and the best-fit straight line is fitted, and errors are checked automatically, so as to establish Chinese-Uyghur bilingual parallel corpus. Experiments show that this method can effectively improve the accuracy and recall rate of sentence alignment, and improve the efficiency of constructing Chinese-Uyghur parallel corpus.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.188.123.155