检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]山东大学计算机科学与技术学院,济南250101 [2]山东经济学院信息管理学院,济南250014
出 处:《模式识别与人工智能》2011年第1期103-110,共8页Pattern Recognition and Artificial Intelligence
基 金:国家自然科学基金项目(No.60970047);山东省自然科学基金项目(No.Y2008G19);山东省科技攻关项目(No.2007GG10001002;2008GG10001026)资助
摘 要:提出一种针对层次分类的文本特征选择方法.先给出类别层次相关度的概念,并利用分类树和训练数据在不同层次上的概率分布进行计算,进而得到分类树中不同类别的重要性.最后基于前面的计算结果,计算每个特征对类别的识别能力,并选择识别能力大的特征组成用于分类的特征集合.实验表明该方法在选取的特征质量以及在accuracy、F1和micro-Precision等分类测度上均优于传统方法.An approach of feature selection for hierarchical classification is proposed. Firstly, the concept of category hierarchical correlation degree is introduced and it is calculated according to the category tree and the probability distribution of training data on different levels. Then, the importance degrees of categories are computed according to hierarchical correlation degree. Finally, the discriminative abilities of features are calculated based on the previous computation and the features with the greater discriminative ability are chosen as the feature set for classification. Experimental results show that the proposed approach outperforms the traditional feature selection methods on both quality of the features selected and standard classification metrics in terms of accuracy, F1 and micro-precision.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.129.67.167