检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《中文信息学报》2006年第B03期66-70,共5页Journal of Chinese Information Processing
基 金:国家863计划资助项目(2004AA117010);国家自然科学基金资助项目(60373080)
摘 要:十年来,统计方法在机器翻译中的应用得到了广泛的关注,并逐渐成为机器翻译研究的主流方法。构造高质量统计机器翻译系统的重要基础是大规模高质量的双语平行语料库。目前,多数平行语料库包含着错误或噪音,它们极大影响着统计机器翻译系统的性能。用人工手段来筛选语料库中的句对是费时费力的,本文研究了一种有助于处理这一问题排序模型,该模型考虑了多方面的因素,包括:语言模型、长度信息、意义对应等。鉴于如今的统计机器翻译系统都依赖词对齐信息,词对齐因素也被考虑入本模型中。文章最后的实验度结果表明本模型具有较好的性能。In the past ten years, statistical methods have been more and more popular in the research of machine translation. The pedormance of a statistical machine translation system is dependent on many aspects, such as the translation model, the search strategy and the parallel corpus. Specifically, parallel corpus has become an essential resource for the SMT system. Many parallel corpora contain errom and it's tiring and time-consuming to filter bad sentence pairs out. In this paper, a model called ranking model that will help dealing with such problem was addressed. In this model, both syntax features and semantics features of sentence pairs are considered. Since most current statis- tical machine translation models depends on word alignment, features related to word alignment information are also included. At the end of this paper, an experiment was carried out and the results showed that our model had promising performance.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.63