EBMT系统中的多词单元翻译词典获取研究  被引量:5

Extraction of Translation Lexicon with Multi-word Units for EBMT

在线阅读下载全文

作  者:程洁[1] 杜利民[1] 

机构地区:[1]中国科学院声学研究所语音交互技术研究中心,北京100080

出  处:《中文信息学报》2004年第1期55-61,共7页Journal of Chinese Information Processing

摘  要:EBMT系统是一种基于语料库的机器翻译方法 ,其主要思想是通过类比原理进行翻译。如何从语料库中提取出一个实用的翻译词典进行系统的辅助翻译已经越来越多的引起关注。本文探讨了如何结合阈值和关联度提取的方法获取多词单元翻译词典 ,在这两种方法中 ,阈值提取受主观影响太大 ,关联值提取效率太低 ,都不能很好的满足翻译词典提取的要求。本文提出的算法利用阈值提取出备选多词单元 ,其中提出了四点规则弱化主观影响且保证全面覆盖所有多词单元 ,降低了阈值本身所带来的不精确度的影响 ,然后对计算结果进行三层过滤 ,进一步提高了准确率 ;该算法还合并了单词译成多词单元和多词单元互译两部分词典的提取 。EBMT system is one of corpus based machine translation methods that applies analogy theory to translation as its main idea. It has been focused on how to extract wieldy lexicons for computer aided translation system. The article discusses how to extract multi word units translation lexicon with the approach of combining the threshold filter by the association value. In the two methods, the choice of the threshold depends on subjective estimation excessively; and the calculation of the association value cannot be executed effectively. So all of them cannot meet the demand of the extraction of translation lexicon. The algorithm that is proposed in this paper first extracts the prepared multi word units, simultaneously we lessen the subjective affection and cover all of the multi word units by using four pairs of thresholds, so reduce the influence that the threshold itself brings about. At the same time, we filter the result for three times and improve the correctness much more. And the algorithm increases the efficiency by incorporating the multi word units translation of the single word with the multi word units translation of the multi word units.

关 键 词:人工智能 机器翻译 EBMT 翻译词典 多词单元 

分 类 号:TP391.2[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象