基于频繁项特征扩展的短文本分类方法  被引量:10

Method of Short Text Classification Based on Frequent Item Feature Extension

在线阅读下载全文

作  者:靳一凡 傅颖勋 马礼 JIN Yi-fan;FU Ying-xun;MA Li(College of Information,North China University of Technology,Beijing 100144,China)

机构地区:[1]北方工业大学信息学院

出  处:《计算机科学》2019年第B06期478-481,共4页Computer Science

基  金:国家自然科学基金(61702013);北京市优秀人才培养资助项目(2016000020124G016);北京市教委科技计划项目(KM201710009008);北方工业大学科研启动项目资助

摘  要:短文本具有特征维度高且稀疏等特点,导致将传统的分类方法应用于短文本分类时效果较差。针对此问题,提出基于频繁项特征扩展的短文本分类方法(Short Text Classification Based on Frequent Item Feature Extension,STCFIFE)。首先通过FP-growth算法挖掘背景语料库的频繁项集,结合上下文的关联特征,计算出扩展特征权重;然后将新特征加入到原短文本的特征空间中,在此基础上训练SVM(Support Vector Machine,SVM)分类器,并进行分类。实验结果表明,与传统的SVM算法和LDA+KNN算法相比,STCFIFE方法能有效缓解短文本特征不足、高维稀疏的问题,使F 1值提升了2%~10%,提高了短文本的分类效果。Short text has the characteristics of high feature dimension and sparse,as a result,the traditional classification method is not effective in short text classification.To solve this problem,a short text classification method based on frequent item feature extension called STCFIFE was proposed.First of all,frequent itemsets in the background corpus are mined through FP-growth algorithm,and combining the contextual association feature,the extended feature weight is calculated.Then the new features are added to the feature space of the original short text.On this basis,SVM(Support Vector Machine)classifier is trained for classification.The experimental results show that,compared with the traditional SVM algorithm and the LDA+KNN algorithm,STCFIFE can effectively alleviate problems of feature deficiency and high dimensional sparsity in short text and improves F1 value by 2%~10%,improving the classification effect in short text.

关 键 词:短文本分类 特征扩展 频繁项挖掘 特征权重 支持向量机 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象