基于统计的汉藏机器翻译系统关键技术研究与实现  被引量:5

Research on the key technology of Chinese-Tibetan machine translation system based on statistical method

在线阅读下载全文

作  者:群诺 尼玛扎西[1] 完么扎西 嘎玛扎西 Choenor;Nyima-Tashi;Pema-Thashi;Karma-Thashi(School of Information Science and Technology,Tibet University,Lhasa 850000,China;School of Computer Science Technology,Fudan University,Shanghai 201203,China;Nationality Normal College,Qinghai Normal University,Xining 810008,China)

机构地区:[1]西藏大学信息科学技术学院,西藏拉萨850000 [2]复旦大学计算机科学技术学院,上海201203 [3]青海师范大学民族师范学院,青海西宁810008

出  处:《高原科学研究》2018年第2期97-104,共8页Plateau Science Research

基  金:国家重点研发计划项目(2017YFB1402200);西藏自治区科技计划重大科技专项(ZDZX2017000136);西藏自治区科技计划重点项目(2015XZ01G25)

摘  要:随着统计机器学习方法的迅速普及,机器翻译技术有了突飞猛进的进展,但是目前基于汉藏两种语言的机器翻译系统研究还处于启蒙阶段。文章主要研究和扩充了已有的统计翻译模型,对藏文文法的特殊性进行了适当的处理,包括藏文动词的时态处理、动词及物性处理、格助词处理等;为解决平行语料不足导致的数据稀疏问题,对基于中介语言的词语翻译模型进行了改进,融合了基于中介语言的统计翻译模型和直接翻译模型;应用"少监督"的方法,改善了统计机器翻译模型训练过程的盲目性、低效性、冗余性和表面性等缺陷,并将这种方法加入到现有的训练过程得到改善的训练方法。The machine translation technologies have been made great progress with rapid popularization of sta-tistical machine learning methods.However,the study on Chinese-Tibetan machine translation system is still in the stage of initiation.The existing statistical translation models were researched and extended,and the specifici-ty of Tibetan grammar was properly processed including Tibetan verbal tenses,verbal transitivity and case-auxil-iary words processing in Tibetan in this paper.Intermediary language based word translation model has been im-proved and integrated with the intermediary languages based statistical translation models and direct translation models,in order to solve the data sparse problem caused by the lack of parallel corpus.The imperfection of blind-ness,inefficiency,redundancy and superficiality in statistical machine translation model training process can be improved with application of"less supervision"approach.This approach can also be applied to current training process to obtain an improved training method.

关 键 词:汉藏机器翻译 调序算法 树到串翻译模型 自动分词与标注 

分 类 号:H085[语言文字—语言学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象