基于随机行走N步的汉语复述短语获取方法  被引量:1

Acquiring Chinese paraphrases based on random walk of N steps

在线阅读下载全文

作  者:马军[1] 张玉洁[1] 徐金安[1] 陈钰枫[1] 

机构地区:[1]北京交通大学计算机与信息技术学院,北京100044

出  处:《中国科学:信息科学》2017年第8期1066-1077,共12页Scientia Sinica(Informationis)

基  金:北京交通大学人才基金(批准号:KKRC11001532);国家自然科学基金(批准号:61370130;61473294);中央高校基本科研业务费专项资金(批准号:2015JBM033)资助项目

摘  要:在利用大规模双语语料获取复述知识方面,传统的基于"枢轴"方法只能考虑两步以内的复述现象.本文针对已有方法的局限性,对不同语言之间互为翻译的短语对构建翻译关系图,提出基于随机行走N步的复述获取算法,改进已有方法以获取更多潜在的复述知识.本文描述了由汉英短语翻译表构建翻译关系图的方法、基于N步的随机行走算法和基于期望步数的复述短语可信度计算方法.同时,本文提出面向多语言对的翻译关系图扩展方法.在NTCIR汉英和英日双语平行语料上进行了实验与评测,并与传统方法进行了对比.实验结果表明本文所提出的方法能够获取更多的复述知识,而且扩展语言对的翻译关系图能够有效获取更多潜在的复述知识.The conventional "pivot" approach of acquiring paraphrases from bilingual corpus has certain limitations where only candidate paraphrases within two steps are considered. In this paper, we propose a graph-based model of acquiring paraphrases from a phrase translation table. First, we describe a graph-based model representing Chinese-English phrase translation relations, a random walk algorithm based on N number of steps and a confidence metric for the obtained paraphrases. Furthermore, with the aim of finding more potential for Chinese paraphrases, we augment the model so that it is able to integrate other language pairs, such as English-Japanese phrase translation relations. We performed experiments on NTCIR Chinese-English and English-Japanese bilingual corpus and compared the results to those of conventional methods. The experimental results show that the proposed approach acquires more paraphrases. In addition, the performance was improved further after the English-Japanese phrase translations were added to the graph-based model.

关 键 词:复述获取 短语翻译表 翻译关系图 随机行走 期望步数 

分 类 号:TP391.2[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象