语义相似度领域基于XGBOOST算法的关键词自动抽取方法  被引量:1

Automatic Keyword Extraction Based on XGBOOST Algorithm in Semantic Similarity Domain

在线阅读下载全文

作  者:王成柱 魏银珍[1] WANG Chengzhu;WEI Yinzhen(Wuhan Research Institute of Posts and Telecommunications,Wuhan 430074)

机构地区:[1]武汉邮电科学研究院,武汉430074

出  处:《计算机与数字工程》2020年第6期1300-1303,1385,共5页Computer & Digital Engineering

摘  要:关键词自动提取一直都是自然语言处理领域的一个基础问题与研究热点,随着文本数据的指数级增长与应用场景的不断扩展,如何高效且准确地自动提取关键词进一步得到了研究者的广泛关注。在语义相似度计算中,对两个文本进行关键词抽取的效果都对判断两个文本是否相似的结果有重大影响。论文提出了一种在语义相似度领域融合KL散度,TF-IDF,词性,词语长度等多种特征,基于XGBOOST算法的关键词自动抽取方法,实验结果表明,该方法与KL散度,TF-IDF以及基于传统机器学习算法的有监督方法相比,效果有显著提升。Automatic keyword extraction has always been a basic issue and research focus in the field of natural language processing. With the exponential growth of text data and the continuous expansion of application scenes,how to extract key words efficiently and accurately has been paid more attention by researchers. In semantic similarity computation,the effect of keyword extraction on two texts has a significant impact on judging whether the two texts are similar. This paper presents an automatic keyword extraction method based on XGBOOST algorithm,which combines KL divergence,TF-IDF,part of speech and the length of word in semantic similarity computing. The experimental results show that the method has a significant effect compared with the KL divergence,TF-IDF and the supervised method based on the traditional machine learning algorithm.

关 键 词:自动抽取 KL散度 XGBOOST 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象