检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:钟玉峰[1]
机构地区:[1]黑龙江工程学院计算机科学与技术系,黑龙江哈尔滨150050
出 处:《黑龙江工程学院学报》2011年第4期60-62,71,共4页Journal of Heilongjiang Institute of Technology
摘 要:首先介绍文献术语的重要性和分布情况,归纳常用的文献术语抽取方法,进而提出一种从英汉平行语料库中自动抽取术语的算法。主要采用基于字符长度的改进的统计方法对平行语料进行句子级的对齐,并对英文语料和中文语料分别进行词性标注。统计已对齐和标注的双语语料中的名词和名词短语生成候选术语集。然后对每个英文候选术语计算与其相关的中文翻译之间的翻译概率。最后针对平行语料库《中华人民共和国著作权法实施条例》进行术语抽取实验。The importance and the distribution circumstances of literature technical terms are introduced first. Extraction methods mostly in use of literature technical terms are sumed up. And then an algorithm for the automatic extraction of bilingual term from English-Chinese parallel corpus is proposed in the paper. Parallel corpus is aligned chiefly by improved statistical method, which is based on character length, and tagged with their part-of-speech categories respectively. The term candidate set is produced by counting the nouns and noun phrases of both corpora. Then the translation probability between every English candidate term and its Chinese translation are calculated. Finally, the experiments of term extraction on Parallel Corpus of Regulations for the Implementation of the Copyright Law of the PRC had been done.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15