检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:冯文贺 李熳佳 张文娟 Feng Wen-he;Li Man-jia;Zhang Wen-juan(Lab of Language Engineering and Computing,Center for Linguistics and Applied Linguistics,Guangdong University of Foreign Studies,Guangzhou 510420,China;School of Computer Science and Engineering,Guangzhou Institute of Science and Technology,Guangzhou 510420,China)
机构地区:[1]广东外语外贸大学外国语言学及应用语言学研究中心语言工程与计算实验室,广州510420 [2]广州理工学院计算机科学与工程学院,广州510420
出 处:《外语学刊》2025年第1期39-46,共8页Foreign Language Research
基 金:教育部人文社科基金“汉英机器翻译的结构性篇章质量评估研究”(24YJA740014);教育部人文社科基金“面向机器翻译的汉英复杂句主从对齐语料自动构建”(22YJCZH091);广东省教育厅GK特色创新项目“机器翻译的结构性篇章质量评估研究”(2023WTSCXO17)的阶段性成果。
摘 要:长句翻译一直是机器翻译的难题。本文根据汉语中相当数量的逗号和句号可相互转化的特点,提出“隐性句号”和“隐性逗号”概念,并实现其自动识别,以将汉语长句变为短句用于汉英机器翻译。为此,首先通过人工与半监督学习结合方法构建一个隐性句逗数据集,实现基于预训练模型的隐性句逗识别方法,其中性能最好的Hierarchical BERT作为后续应用模型。进而,实现基于隐性句逗识别的汉英机器翻译方法。在新闻和文学公开翻译测试语料上基于预训练机器翻译模型的实验表明,对于汉语长句的英译,本文方法相比基准翻译的BLEU值整体有所提高,而且在相对稳健机器翻译模型上,呈现为句子越长本文方法效果越明显。The translation of long sentences has always been a difficult task for machine translation.In this paper,based on the feature that a considerable number of commas and periods in Chinese text can be transformed into each other,we propose the concepts of“implicit period”and“implicit comma”,and realize their automatic recognition to transform Chinese long sentences into short sentences for Chinese⁃English machine translation.In this paper,a dataset of implicit period and comma is constructed by combining manual and semi⁃supervised learning methods,and an implicit period and comma recognition method is realized based on a pretrained model,in which Hierarchical BERT,which has the best performance,is used as the subsequent application model.In this paper,a Chinese⁃English machine translation method based on implicit period and comma recognition is realized.The experiments based on pre⁃trained machine translation models on the News and Literature corpus show that for the English translation of Chinese long sentences,the method in this paper improves the BLEU value compared to the benchmark translation as a whole,and the effect of the method in this paper is more obvious the longer the sentence is for the relatively robust machine translation model.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.112