检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王成柱 魏银珍[1] WANG Chengzhu;WEI Yinzhen(Wuhan Research Institute of Posts and Telecommunications,Wuhan 430074)
出 处:《计算机与数字工程》2020年第6期1300-1303,1385,共5页Computer & Digital Engineering
摘 要:关键词自动提取一直都是自然语言处理领域的一个基础问题与研究热点,随着文本数据的指数级增长与应用场景的不断扩展,如何高效且准确地自动提取关键词进一步得到了研究者的广泛关注。在语义相似度计算中,对两个文本进行关键词抽取的效果都对判断两个文本是否相似的结果有重大影响。论文提出了一种在语义相似度领域融合KL散度,TF-IDF,词性,词语长度等多种特征,基于XGBOOST算法的关键词自动抽取方法,实验结果表明,该方法与KL散度,TF-IDF以及基于传统机器学习算法的有监督方法相比,效果有显著提升。Automatic keyword extraction has always been a basic issue and research focus in the field of natural language processing. With the exponential growth of text data and the continuous expansion of application scenes,how to extract key words efficiently and accurately has been paid more attention by researchers. In semantic similarity computation,the effect of keyword extraction on two texts has a significant impact on judging whether the two texts are similar. This paper presents an automatic keyword extraction method based on XGBOOST algorithm,which combines KL divergence,TF-IDF,part of speech and the length of word in semantic similarity computing. The experimental results show that the method has a significant effect compared with the KL divergence,TF-IDF and the supervised method based on the traditional machine learning algorithm.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249