基于序列相交的短语译文获取  被引量:3

Sequence Intersection Based Phrase Translation Extraction from Bilingual Corpus

在线阅读下载全文

作  者:王辰[1] 宋国龙[1] 吴宏林[1] 张俐[1] 刘绍明[2] 

机构地区:[1]东北大学自然语言处理实验室,辽宁沈阳110004 [2]富士施乐公司,日本神奈川

出  处:《中文信息学报》2009年第1期38-43,共6页Journal of Chinese Information Processing

摘  要:短语译文获取技术是基于实例的机器翻译(EBMT)中的核心技术之一,其准确率直接影响到EBMT系统的性能。该文提出了一种基于序列相交的短语译文获取方法,该方法将句子视为词的序列,利用对中日句对齐语料库中包含待译短语的所有源语句子对应的目标语句子进行序列相交的方式,在不需要词对齐、句法分析及词典等资源的情况下,通过充分挖掘句对齐双语语料库的信息,获得高质量的短语译文。实验表明,该方法获得的短语译文准确率超过80%。Phrase translation extraction is one of the key techniques in the Example-Based Machine Translation (EBMT) ,and its accuracy has a direct influence on the EBMT system performance. This paper proposes a phrase translation extraction method based on sequence intersection in which the sentence is taken as word sequence. Among Chinese-Japanese sentence aligned bilingual corpus, the source sentences containing the phrase are first searched out. Then the pairwise intersections of all these target sentences are acquired as the phrase translaiton. This approach can achieve high quality phrase translations by mining the bilingual corpus, avoiding pre possing steps like word alignment, parsing and dictionary. The experiments show our method achieves over 80 % accuracy for the acquired phrase translation.

关 键 词:计算机应用 中文信息处理 EBMT 短语译文获取 序列相交 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象