检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]中国人民公安大学,北京100038
出 处:《武汉大学学报(工学版)》2016年第3期469-475,共7页Engineering Journal of Wuhan University
基 金:公安部公安理论及软科学研究重点项目(编号:2013LLYJGADX003)
摘 要:为了解决短文本对象特征空间稀疏性与背景缺失造成的精确分类困难与语义混淆问题,提出一种背景补偿与边缘相关计算的特征选择方法.通过提取并利用文本间存在的关联性建立小样本簇背景特征集,重构特征空间,并结合边缘相关性分析确定最终的特征集.过程可分为2个阶段:1)基于词矢量语义量化模型计算特征词的背景相关性;2)将测试文本重组特征空间,并进行边缘性相关计算.提出的短文本特征选择方法,可以在保持原始特征空间性质与结构的前提下,强化特征空间紧凑性,减少冗余性,降低特征维度.在Reuters-21578和NewsGroup标准语料集上的实验证明,提出的方法比传统的文档频率、信息增益、互信息等方法更有效,针对两个标准的数据集,其在典型的分类器上运行表现强于一般特征选择方法.In order to solve the problem of semantic ambiguity and difficulty to make a high accuracy of category due to the feature sparsity and context missing for short text objects, a feature approach is proposed based on context compensation and marginal relevance. The correlation will be extracted and used to make up a sample set of context features; and then the feature space is reconstructed. The final set of features will be determined after the calculation of maximal marginal relevance. The process is divided into two stages: 1) Calculating context correlation for feature words based on word vector semantic quantitative mod- el; 2) Reorganize feature space of test texts and calculate the marginal relevance. The proposed approach based on context reconstruction and maximal marginal relevance could reduce the redundancy and strengthen compactness of feature space while maintaining original feature structure and relevance in the process of feature selection. Experiments on standard Reuters-21578 and NewsGroup datasets are carried out to testi- fy the capability of proposed method; the empirical results show that the proposed method is more effective than traditional document frequency, information gain and mutual information etc. Moreover, it could pro- duce some improvements of performance for some traditional classifiers.
分 类 号:TH133[机械工程—机械制造及自动化]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.119.136.220