一种基于独立性统计的子串归并算法被引量：1

Substring reduction algorithm based on independence statistic

机构地区：[1]南京理工大学计算机科学与技术学院,南京210094 [2]中国科学院计算机语言信息工程研究中心,北京100097 [3]宁波职业技术学院计算机系,浙江宁波315800

出　　处：《计算机工程与应用》2010年第24期129-131,共3页Computer Engineering and Applications

基　　金：国家高技术研究发展计划(863)(No.2006AA01Z152;No.2006AA010109);国家自然科学基金(No.60672149);宁波科技局重点科技项目(No.2007A310001)~~

摘　　要：现行的子串归并算法都是采用一对一的方式针对同频子串提出的。但是在使用词法分析工具对文本进行切分时,不可避免地会产生很多的分词碎片,这直接导致了很多无意义子串的产生。通过分析这些无意义子串和众多父串之间的这种一对多关系,提出了一种基于独立性统计的子串归并算法。最后将该子串归并算法应用在中文术语抽取系统中,使得系统的准确率从91.3%提升到了93.32%。The substring reduction algorithm applied in most cases is mainly focusing on the substrings having the same frequency with the parent string in one to one mode.After being processed by the morphological analysis tool,it＇s unavoidable to product many segment fragments which compose many meaningless substrings.According to the analysis of the one to multiple relationship between the meaningless substring and its parent strings,a substring reduction algorithm based on independence statistic is proposed to filter these meaningless substrings.Finally,this substring reduction algorithm is applied in the Chinese multi-words terminology extraction system,and the precision of the term extraction results is improved from 91.3% to 93.32%.

关键词：子串归并独立性统计分词碎片

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于独立性统计的子串归并算法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于独立性统计的子串归并算法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种基于独立性统计的子串归并算法被引量：1