检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:麦合甫热提[1] 麦热哈巴.艾力[2] 米莉万.雪合来提
机构地区:[1]新疆大学教务处,新疆乌鲁木齐830046 [2]新疆大学信息科学与工程学院,新疆乌鲁木齐830046
出 处:《计算机工程与设计》2015年第8期2297-2302,共6页Computer Engineering and Design
基 金:国家自然科学基金项目(61262061);自治区科技计划基金项目(201423120)
摘 要:维吾尔语中,词的复杂形态是导致数据稀疏问题的主要原因,为降低数据稀疏对词对齐和机器翻译的不良影响,尽可能挖掘词尾携带的语义信息,提出对词尾采取"分离-丢弃"方案。根据统计分析,对维吾尔语词进行词干、词尾分离后,对其语义信息被明文翻译概率高的词尾采取"分离"方案,概率低的词尾采取"丢弃"方案。将该方案应用到维吾尔语名词和动词上,分等级构造9种模板进行实验,实验结果表明,该方案抑制了词干、词尾分离带来的句子长度过长问题,增加了维汉词对的数量,提高了维汉机器翻译质量,验证了该方案的有效性。The main reason leads to data sparseness is rich morphological forms of words in Uyghur. To reduce the negative effects of data sparseness on Uyghur-Chinese word alignment and machine translation, a separating-dropping method was presen- ted. According to the statistical analysis, the affixes with highly translated probability were separated from stem and the affixes with lower translated probability were dropped. This method was applied to two main word classes including noun and verb in Uyghur, and 9 models were constructed for experiments. Results of experiments show the proposed method controls the length of the sentence caused by separating stem and affixes, the number of Uyghur-Chinese word pairs is increased, the quality of Uy- ghur-Chinese machine translation is improved, and the efficiency of this method is verified.
关 键 词:词对齐 维汉机器翻译 维汉词对齐 词尾粒度 形态分析
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117