一种基于词义降维的主题特征选择算法  被引量:1

A THEME FEATURE SELECTION ALGORITHM BASED ON WORDS MEANING DIMENSION REDUCTION

在线阅读下载全文

作  者:肖雷[1] 王旭[1] 粟武林 

机构地区:[1]河北大学电子信息工程学院,河北保定071000 [2]河北大学数学与计算机学院,河北保定071000

出  处:《计算机应用与软件》2016年第3期244-247,263,共5页Computer Applications and Software

基  金:国家自然科学基金项目(60903089);河北大学博士项目(Y2009157)

摘  要:在文本特征选择中,由于词语概率空间和词义概率空间的差异,完全基于词语概率的主题特征往往不能很好地表达文章的思想,也不利于文本的分类。为达到主题特征更能反映文章思想这一目的,提取出一种基于词义降维的主题特征选择算法。该算法通过在词林基础上构建"同义词表",作为词到词义的映射矩阵,构造一个基于词义之上的概率分布,通过LDA提取文本特征用于分类,分类准确率得到了明显提高。实验表明,基于此种方法所建立的主题模型将有更强的主题表示维度,通过该算法基本解决文本特征提取中词语概率和词义概率之间差异的问题。In text feature selection,due to the difference between words probability space and words meaning probability space,the theme features entirely based on words probability usually cannot well express the idea of the article,nor be conducive to text classification. To achieve the purpose that the theme features can better reflect the article thoughts,we extracted a theme feature selection algorithm which is based on words meaning dimension reduction. By constructing a " synonym table" based on words dictionary as the mapping matrix of words to words meaning,the algorithm constructs a words meaning-based probability distribution,and extracts text features by LDA for classification,the accuracy of classification is significantly improved. Experiments show that the theme model built by this method will have a stronger theme representation dimension,through the algorithm the problem of difference between words probability and words meaning probability in text feature extraction is basically solved.

关 键 词:LDA 主题模型 主题表示维度 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象