英汉环保领域平行语料的句对齐与再对齐  被引量:4

Sentence Alignment and Re-Alignment for Environmental Protection Texts in English-Chinese Parallel Corpus

在线阅读下载全文

作  者:熊文新[1] 

机构地区:[1]北京外国语大学中国外语教育研究中心,北京100089

出  处:《现代图书情报技术》2013年第6期36-41,共6页New Technology of Library and Information Service

基  金:教育部人文社会科学研究项目"基于语料库及对应词表的英语特异组合研究"(项目编号:09YJA740013);国家社会科学基金项目"服务信息检索的自然语言"(项目编号:11BYY051);教育部新世纪优秀人才支持计划(项目编号:NCET-11-0591)的研究成果之一

摘  要:从资源建设角度对现有基于统计的句对齐工具进行用户易用性及性能比较,认为Champollion比较适合英汉双语句对齐处理。借鉴"基于转换错误驱动"的思路,对Champollion对齐错误结果利用语言学规则实施再对齐,使句对齐效果进一步提升。以英汉环保领域专业文本为例,句对齐的准确率从最初的88.74%上升至93.91%。这种结合基于统计对齐工具和语言学知识应用的对齐和再对齐处理方法在"分步骤按领域"建设大规模双语语料库的过程中具有普适性。Sentence alignment is a crucial step for building parallel corpus. There are plenty of such tools available for constructing a language repository for machine translation systems. Based on the evaluation regarding user - friendly design and alignment quality, the performance of Champollion is superior to other mainstream open source tools in aligning English - Chinese parallel texts. Inspired by "transformation - based error - driven" strategy, the author makes a thorough linguistic analysis on the error output produced by Champollion, and proposes an error correction strategy which improves the precision rate dramatically. The realignment approach as a module attached to Champollion' s output can reach a precision rate 93.91% from baseline 88.74%, in the case of alignment of English -Chinese texts in the area of environmental protection. This alignment and realignment strategy combined statistics - based method with linguistic insights can be applied to other domains.

关 键 词:英汉平行语料库 环保文本 句对齐 再对齐 基于转换错误驱动 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象