汉藏短语抽取被引量：5

Chinese Tibetan Phrase Extraction

作　　者：诺明花[1,2] 张立强[1] 刘汇丹[1,2] 吴健[1] 丁治明[1]

机构地区：[1]中国科学院软件研究所,北京100190 [2]中国科学院研究生院,北京100049

出　　处：《中文信息学报》2011年第2期105-110,121,共7页Journal of Chinese Information Processing

基　　金：中国科学院"西部行动计划高新技术项目"资助(KGCX2-YW-512)

摘　　要：该文将从汉藏法律法规和公文领域平行语料中提取双语短语对。考虑现阶段藏文资源匮乏,提出两步汉藏短语抽取方法。第一步是提取汉语有效语块,这部分工作不是该文工作重点。第二步是获取待翻译汉语短语的译文,该模块提出藏文词序列相交算法抽取藏文短语。该算法可以很好的抽取1-1和1-n连续和非连续藏文短语。This paper describes a method to extract phrase pairs from domain-specific Chinese-Tibetan bilingual corpus of laws,regulations and official documents.So far,widely used phrase extraction methods heavily depend on the result of word alignment or additional resources like part-of-speech or syntactic analysis and so forth.Taking account of inadequate resources in Tibetan at present,this paper proposes a two-phase Chinese-Tibetan phrase pairs extraction method.The first step is to extract the Chinese phrase（multi-word chunk） using Nagao＇s Algorithm and Substring Reduction Algorithm.The second step is to extract the candidate Tibetan translation for translation-ready Chinese phrase.This paper proposes Tibetan words sequence intersection algorithm（TIA） to extract Tibetan phrase.TIA works well on both 1-1 translation and 1-n translation（either continuous or discontinuous） Tibetan phrase.

关键词：汉藏短语抽取藏文信息处理中文信息处理

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

汉藏短语抽取被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

汉藏短语抽取 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

汉藏短语抽取被引量：5