一种基于稀疏自编码器的涉恐短文本特征提取方法  被引量:5

Feature Extraction and Clustering of Terrorism Short Text Based on Sparse Auto-Encoder

在线阅读下载全文

作  者:黄炜 黄建桥 李岳峰 Huang Wei;Huang Jianqiao;Li Yuefeng(School of Economics and Management, Hubei University of Technology, Wuhan 430064)

机构地区:[1]湖北工业大学经济与管理学院,武汉430064

出  处:《情报杂志》2019年第3期203-206,I0001,186,共6页Journal of Intelligence

基  金:国家自然科学基金项目"微博环境下实时主动感知网络舆情事件的多核方法研究"(编号:71303075)及"大数据环境下基于特征本体学习的无监督文本分类方法研究"(编号:71571064)研究成果之一

摘  要:[目的/意义]稀疏自编码器是深度学习领域中一种较为高效的文本特征提取方法,有利于解决大规模涉恐短文本高维、稀疏难处理等问题。[方法/过程]首先经稀疏自编码器无监督学习方法降维,提取数据隐含特征,然后利用LDA主题聚类算法进行文本聚类,并通过与传统特征提取算法对比实验效果来验证该方法的有效性和高效性。[结果/结论]实验结果证明,将稀疏自编码器提取的文本特征用于LDA主题聚类,有效解决了涉恐短文本高维、稀疏、噪声大的问题,并显著提高了聚类结果的准确性。[Purpose/Significance] Sparse self-encoder is a more efficient method of text feature extraction in the field of deep learning, it is conducive to solving high-dimensional, sparse and other difficult problems of large-scale terrorism short texts.[Method/Process] Firstly, the unsupervised learning method of sparse auto-encoder is used to reduce the dimension, and the hidden features of data are extracted. Then the clustering algorithm of LDA topic is used to cluster texts, and the effectiveness and efficiency of the method are verified by comparing the experimental results with the traditional feature extraction algorithm.[Result/Conclusion] The experimental results prove that using sparse auto-encoder extracted text features for LDA topic clustering can effectively solve the problem of high-dimensional, sparse, and loud noises in short texts related to terrorism, and significantly improve the accuracy of clustering results.

关 键 词:涉恐文本 稀疏自编码器 特征提取 LDA主题聚类 

分 类 号:C931.6[经济管理—管理学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象