中文文本分类相关算法的研究与实现被引量：13

Research and Implementation of Related Algorithm of Chinese Text Categorization

出　　处：《吉林大学学报（理学版）》2009年第4期790-794,共5页Journal of Jilin University:Science Edition

基　　金：国家自然科学基金(批准号:60275026);"十一五"国家科技支撑计划重大项目基金(批准号:2006BAK01A33)

摘　　要：通过对分词歧义处理情况的分析,提出一种基于上下文的双向扫描分词算法,对分词词典进行改进,将词组短语的固定搭配引入词典中.讨论了特征项的选择及权重的设定,并引进2χ统计量参与项的权值计算,解决了目前通用TF-IDF加权法的不足,同时提出了项打分分类算法,提高了特征项对于文本分类的有效性.实验结果表明,改进后的权重计算方法性能更优越.On the basis of the analysis of the process of dealing with the Chinese word segmentation ambiguity, this paper covers bidirectional scan word segmentation algorithm based on the context. In order to improve the word segmentation dictionary, the authors put the fixed phrase into the dictionary and discussed the feature selectionand the weighting schema enactment in detail. In order to solve the problem of general TF-IDF weighting schema at present, we took statistics into consideration, and meanwhile put up the item-scoring method which improves the efficiency of the feature item about text categorization. At last we proved the advantage of the improved weighting schema through test.

关键词：文本分类上下文双向扫描向量空间模型权重特征选择

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

中文文本分类相关算法的研究与实现被引量：13

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

中文文本分类相关算法的研究与实现 被引量：13

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

中文文本分类相关算法的研究与实现被引量：13