长篇语料的平行锚点匹配策略——基于《红高粱家族》中英文本自动对齐实验  

Parallel Anchor Matching Strategy for Long Texts:An Experiment of Automatic Alignment of Chinese-English Bilingual Text in Red Sorghum

在线阅读下载全文

作  者:李新 孙润 LI Xin;SUN Run(School of Foreign Languages,Guangdong Ocean University,Zhanjiang,Guangdong 524088,China)

机构地区:[1]广东海洋大学外国语学院,广东湛江524000

出  处:《广东水利电力职业技术学院学报》2024年第3期105-108,共4页Journal of Guangdong Polytechnic of Water Resources and Electric Engineering

基  金:广东海洋大学校级一般项目(C22862)。

摘  要:平行语料库研究是近年来热门的语言学研究领域,并逐渐广泛应用于翻译研究领域。平行语料库的构建需要人力和计算机技术相结合,通过客观描述不同语言规律来提供研究文本的依据。然而在实践中,利用计算机技术进行语料自动对齐容易出错,特别是出现长篇双语常见对齐错误传播问题。对此,通过《红高粱家族》中英文本自动对齐实验,提出平行锚点匹配策略,其核心思想是利用多句强相似度将长篇语料进行区域划分。该策略能有效遏止对齐错误传播问题,准确率达99%,可为长篇英汉平行语料的高效构建和翻译研究提供参考和借鉴。The study of parallel corpora is a hot linguistic research field in recent years,and has gradually been widely used in the field of translation research.Its construction requires the combination of human resources and computer technology.A parallel corpus contains text from one language and translation from another language.The construction of parallel corpora breaks down subjective judgments and provides the research basis for studying texts by objectively describing different language rules.Because these corpora are huge in volume and quantity,relying solely on manual editing is time-consuming,labor-intensive,and prone to errors,which further highlights the importance of using computer technology for automatic language text alignment.However,in practice,automatic alignment is still prone to errors.Aiming at the common problem of alignment error propagation in the process of computer automatic alignment of large bilingual texts,this paper proposes a parallel anchor matching method through research,which effectively curbs the problem of alignment error propagation.The core idea of this method is to partition long texts into segments using multi-sentence strong similarity.The accuracy rate is as high as 99%.The Red Sorghum is an important representative work of Mo Yan,the Nobel Prize winner in literature.The parallel anchor matching method has important practical significance for efficient construction and translation research of the English-Chinese parallel corpus of the Red Sorghum.

关 键 词:语料对齐 相似度 语料库 《红高粱家族》 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象