基于语料库的汉语复合名词短语自动获取研究  

A Study of Automatic Acquisition of Chinese Compound Noun Phrases Based on Corpus

在线阅读下载全文

作  者:王萌[1] 朱虹[2] 徐戈[3] 

机构地区:[1]江南大学人文学院教育技术系,江苏无锡214122 [2]中国标准化研究院,北京100191 [3]闽江学院计算机科学系,福建福州350108

出  处:《乐山师范学院学报》2014年第12期57-62,共6页Journal of Leshan Normal University

摘  要:汉语复合名词短语广泛存在于各种文体中,且绝大部分是低频的,这给复合名词短语的自动获取带来了很大的挑战性。文章针对统计指标不能有效获取低频复合名词短语的问题,提出了新的解决方法,将其视作一个分类问题,利用统计指标获取典型的、高频的复合名词短语作为训练数据,抽取多种特征,来帮助发现低频的复合名词短语,实验结果说明该思路是有效的。Chinese compound noun phr ases are very common in most texts.Automatic acquisition of these compounds are extremely difficult because most of them have relatively low frequency in corpus.This paper mainly aims at the problem that statistical indexes can not be effective for low frequency complex noun phrases and proposes a new method to deal with it.In this new method,the problem is modeled as a classification problem,which makes use of the statistical indexes to get the typical,frequent compound noun phrases as training data.Then the extraction of various features help find the infrequent ones.The experimen-tal results show that this proposed idea is applicable.

关 键 词:复合名词短语 自动获取 SVM 统计指标 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象