基于平行语料库的文献术语抽取研究  被引量:1

Research on terms extraction in literature technical based on parallel corpus

在线阅读下载全文

作  者:钟玉峰[1] 

机构地区:[1]黑龙江工程学院计算机科学与技术系,黑龙江哈尔滨150050

出  处:《黑龙江工程学院学报》2011年第4期60-62,71,共4页Journal of Heilongjiang Institute of Technology

摘  要:首先介绍文献术语的重要性和分布情况,归纳常用的文献术语抽取方法,进而提出一种从英汉平行语料库中自动抽取术语的算法。主要采用基于字符长度的改进的统计方法对平行语料进行句子级的对齐,并对英文语料和中文语料分别进行词性标注。统计已对齐和标注的双语语料中的名词和名词短语生成候选术语集。然后对每个英文候选术语计算与其相关的中文翻译之间的翻译概率。最后针对平行语料库《中华人民共和国著作权法实施条例》进行术语抽取实验。The importance and the distribution circumstances of literature technical terms are introduced first. Extraction methods mostly in use of literature technical terms are sumed up. And then an algorithm for the automatic extraction of bilingual term from English-Chinese parallel corpus is proposed in the paper. Parallel corpus is aligned chiefly by improved statistical method, which is based on character length, and tagged with their part-of-speech categories respectively. The term candidate set is produced by counting the nouns and noun phrases of both corpora. Then the translation probability between every English candidate term and its Chinese translation are calculated. Finally, the experiments of term extraction on Parallel Corpus of Regulations for the Implementation of the Copyright Law of the PRC had been done.

关 键 词:术语抽取 平行语料库 算法 翻译 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象