融合词语共现距离和类别信息的短文本特征提取方法  被引量:3

A short text feature extraction method combining term co-occurrence distance and category information

在线阅读下载全文

作  者:马慧芳 邢玉莹 王双 张旭鹏 MA Hui-fang;XING Yu-ying;WANG Shuang;ZHANG Xu-peng(College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070;Gangxi Key Laboratory of Trused Software,Guilin University of Electronic Technology,Guilin 541004,China)

机构地区:[1]西北师范大学计算机科学与工程学院,甘肃兰州730070 [2]桂林电子科技大学广西可信软件重点实验室,广西桂林541004

出  处:《计算机工程与科学》2018年第9期1689-1695,共7页Computer Engineering & Science

基  金:国家自然科学基金(61762078;61363058);广西可信软件重点实验室研究课题(kx201705);2016年甘肃省大学生创新创业训练计划项目(201610736040;201610736041)

摘  要:针对传统特征加权方法未充分考虑词语之间的语义信息和类别分布信息的不足,提出了一种融合词语共现距离和类别信息的短文本特征提取方法。一方面,将同一短文本中两个词语之间的间隔词数作为共现距离,计算它们之间的相关度。通过计算这两个词语共同出现的频率,得到每个词的关联权重;另一方面,利用改进的期望交叉熵计算某个词在某个类别中的权重值,将两者整合,得到某个类别中所有词的权重值。对所有类别中的词按权重值的大小进行降序排序,选取前K个词作为新的特征词项集合。实验表明,该方法能够有效提高短文本特征提取的效果。Aiming at the problem that the traditional feature weighting methods do not fully consider the semantic information and category distribution information between terms,a short text feature extraction method combining term co-occurrence distance and category information is proposed.On the one hand,the number of terms between two terms in the same short text is taken as the co-occurrence distance,and the correlation weight between them is calculated.On the other hand,the improved expected cross entropy is used to calculate the weight value of a term in a certain category.They are integrated to obtain the weight value of all the terms in a certain category.The terms in all categories are sorted in descending order according to their weight values,and the top K terms are selected as the new feature term set.Experiments show that our method can improve the effect of short text feature extraction.

关 键 词:短文本 共现距离 期望交叉熵 特征提取 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象