检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《情报学报》2010年第3期408-413,共6页Journal of the China Society for Scientific and Technical Information
基 金:“十一五”国家科技支撑计划重点项目(2006BAH03B02); 国家社科基金项目(06BTQ030)支持
摘 要:特征选择是文本分类的关键技术之一。本文提出一种基于泊松估计的可控特征选择算法,该算法以基于泊松假设估算的文档频率作为衡量特征语义信息的依据,以通信领域中的信息率失真理论作为可控特征选择的思想来源。在Reuters-21578新闻语料上进行的实验结果表明,基于泊松估计的特征选择算法性能优于基于语义的WN算法和同样基于统计的IG、Chi2等算法;在以特征漏选率作为信息率失真函数的前提下,设定分类算法分类指标下限值,则可以通过改变特征漏选率得到任意的分类精度值。实验表明本文算法在与相关算法的对比中存在优势。算法思想来源于通信领域中的信息率失真理论,也是一种在领域融合方面的崭新尝试。Feature selection is one of the most important technologies in text categorization.A new Controllable Feature Selection Algorithm Based on Poisson Estimates(CFSPE) is proposed in this article.It is based on poisson estimates and rate distortion theory in information field,trying to find features in documents with more semantic information and searching for controllable methods for feature selection.The comparative experiments have been done on the Reuters-21578 corpus adopted the IG,Chi2,WN algorithms and the poisson estimates based algorithm presented in this article.Its result shows that the latter one has more advantages.Moreover,the arbitrary effectiveness measure of categorization could be applied by adjusting the omitting ratio of feature selection of categories as long as the lowest effectiveness measure has been provided with the CFSPE.The experiment shows the algorithm proposed in this research is superior to the others.Stemming from rate distortion theory in the communications field,it is a brand-new attempt in the field of information fusion.
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.118.210.233