基于“相同与差异”的机译单元的自动提取研究

Extraction of Machine Translation Units Based on “Similarity and Difference

机构地区：[1]中国科学院声学研究所语音交互技术研究中心,北京100080

出　　处：《中文信息学报》2003年第3期34-40,共7页Journal of Chinese Information Processing

摘　　要：从双语语料库中提取的机译单元能更好地覆盖真实语言文本 ,本文提供了一个通过找出两个双语句对之间非全部为高频功能词的“相同和差异”部分 ,并且利用翻译词典和动态规划算法对齐“相同和差异”部分来获取机译单元的算法。对于获取的候选机译单元 ,本算法设计了三个过滤器来考察其正确性 :双语词串相似度过滤考察其语义对应性 ,词性相似度过滤考察其语法对应性 ,首尾禁用词过滤考察其搭配正确性。通过抽样检验 ,最后提取的机译单元的正确率为 86% ,召回率约为 61 34 % ,该算法对于获取机译单元提供了一种新的实用的方法。The Machine Translation Units extracted from the bilingual corpora can cover the natural language text even more. This paper will describe an algorithm for obtaining the Machine Translation Units by learning the Similarity and Difference that are not all high frequency function words from two bilingual sentence pairs and aligning the Similarity parts and Difference parts by utilizing the Translation Lexicon and Dynamic Programming approach. Then, the Bilingual Chunk Similar Score Filter and the Part of Speech Similar Score Filter are used to test whether the meaning and syntax of the source part of the Machine Translation Unit is corresponding to the target part of the Machine Translation Unit; finally, the Begin and End Stopword Filter is applied to check whether the Machine Translation Units' collocations are correct or wrong. We get an 86% precision and 61 34% recall. This algorithm provides a new practical approach to get Machine Translation Units.

关键词：人工智能机器翻译双语语料库机译单元相同和差异

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于“相同与差异”的机译单元的自动提取研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于“相同与差异”的机译单元的自动提取研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索