融合语义与图结构的短文本特征提取算法  被引量:7

Short Text Feature Extraction Algorithm Based on Semantic and Graph Structure

在线阅读下载全文

作  者:马慧芳 刘晓倩 马兰 伍诗萌 MA Hui-fang;LIU Xiao-qian;MA Lan;WU Shi-meng(College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070,China;Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,China)

机构地区:[1]西北师范大学计算机科学与工程学院,兰州730070 [2]桂林电子科技大学广西可信软件重点实验室,广西桂林541004

出  处:《小型微型计算机系统》2019年第9期1864-1868,共5页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(61762078,61363058)资助;广西可信软件重点实验室研究课题项目(kx201910)资助

摘  要:针对现有的短文本特征提取算法未充分考虑词语间的隐含语义及图的结构特征,提出了一种融合语义与图结构的短文本特征提取算法,该方法首先根据词语的共现构建文本图;其次,利用词语间内外部语义耦合关系及文本图的结构特征分别计算词语间的相似度对文本图中的边加权;最后,设计了一种随机游走的方法将两种边的加权方案有效地综合起来进行迭代计算出节点的重要性,并降序排序取出前K项作为最终的文本集特征词项集合.中英文数据集上的实验证明了该方法可行且有效.Due to the limitations of the existing short text feature extraction method,we propose a short text feature extraction algorithm that integrates semantics among terms and graph structure. Firstly,a text graph is constructed based on the co-occurrence of terms. Secondly,the coupling relations between terms are determined via the internal and external semantics,and the structural features of text graph are considered to calculate the word similarities. Both semantic and graph structures are involved to weigh edges in text graph. Thirdly,a newrandom walk approach is designed to effectively integrate the two kinds of edge weighting schemes and iteratively calculate the importance of nodes. Finally,the importance of nodes are sorted in descending order to extract the top K items to get the final feature terms ranking result. The experiments prove that our method is feasible and effective.

关 键 词:语义耦合 图结构 短文本 随机游走 特征提取 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象