检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]新余高等专科学校 [2]338031南昌大学计算机系
出 处:《微计算机信息》2008年第27期268-270,共3页Control & Automation
摘 要:文本分类是文本数据挖掘中一个非常重要的技术,已经被广泛地应用于信息管理、搜索引擎、推荐系统等多个领域。现有的文本分类方法,大多是基于向量空间模型的算法。这些算法很难适用于大规模的文本数据集。为此,我们提出了一种基于遗传算法和信息熵的文本分类规则抽取方法。在该方法中,信息熵技术用来辅助遗传算法初始种群的生成。遗传算法和信息熵的有效集成,极大地提高了该混合方法的分类效率。实验结果表明,本文方法适用于大规模文本数据集;该方法提取规则的分类正确率较高,分类速度较快。Text classification is a very important technique in the field of text mining, and it has been widely applied to the information management, search engine, recommendation systems, and some other fields. Most classification methods are based on vector models, these approaches are highly complicated on computation, and cannot be used on the occasion of classifying a large number of samples. For this reason, a hybrid approach combining genetic algorithm with information entropy is presented for text classification rule extraction. In this hybrid approach, the information entropy technique is applied to assist the generation of initial populations for genetic algorithm. The classification performance of the proposed approach has been improved largely by integrating genetic algorithm with information entropy effectively. The proposed approach can be applied to classify a large number of samples. Experimental results show that both the accuracy and the speed of categorization are high.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.21.35.68