基于位置标签与词性结合的组合词抽取方法被引量：3

Compound word extraction based on location tag and POS

出　　处：《计算机应用研究》2016年第4期1062-1065,共4页Application Research of Computers

基　　金：国家自然科学基金资助项目(61472132);湖南省产学研结合重大科技成果转化资助项目(2010XK6024);国家核高基重大专项资助项目(2012ZX01045-004-005-002)

摘　　要：现有分词系统不能及时收录新词语,因而不能有效识别领域组合词。针对此问题,提出一种位置标签与词性相结合的组合词抽取方法。首先对语料进行文本预处理、添加位置标签、加权词频过滤等建立词条的位置标签集;然后依据位置标签集计算词条在句子中的相邻度判定组合词;最后制定反规则对抽取结果进行过滤,并对垃圾串进行两端逐步消减再判定进一步识别组合词。通过在不同语料库上进行实验,结果表明本方法具有更高的准确率。Now existing segmentation systems cannot recruit new words timely,so they cannot identify compound words effectively. To solve that,this paper proposed a method of compound word extraction based on location tag and POS（ part of speech）. First,this method established location tag set for each item by processing corpus texts,adding location tag for each item and filtering items with weighted term frequency. Then it counted adjacent degree to judge compound words on the basis of location tag set. Finally,formulated reverse rules and filtered garbage strings with them,detected combined words further from garbage strings by removing item from the head and the tail. Experiments were carried out on different corpora,and the results show that this method has higher precision.

关键词：组合词抽取位置标签集相邻度反规则过滤新词发现

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于位置标签与词性结合的组合词抽取方法被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于位置标签与词性结合的组合词抽取方法 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于位置标签与词性结合的组合词抽取方法被引量：3