基于词嵌入与扩展词交集的查询扩展  被引量:3

Expanding Queries Based on Word Embedding and Expansion Terms

在线阅读下载全文

作  者:黄名选[1,2] 蒋曹清 卢守东[2] Huang Mingxuan;Jiang Caoqing;Lu Shoudong(Guangxi Key Laboratory of Cross-border E-commerce Intelligent Information Processing,Guangxi University of Finance and Economics,Nanning 530003,China;School of Information and Statistics,Guangxi University of Finance and Economics,Nanning 530003,China)

机构地区:[1]广西财经学院广西跨境电商智能信息处理重点实验室,南宁530003 [2]广西财经学院信息与统计学院,南宁530003

出  处:《数据分析与知识发现》2021年第6期115-125,共11页Data Analysis and Knowledge Discovery

基  金:国家自然科学基金项目(项目编号:61762006)的研究成果之一。

摘  要:【目的】针对信息检索中词不匹配问题,提出一种词嵌入与扩展词交集融合的查询扩展模型。【方法】对初检文档集进行词嵌入学习训练和关联规则挖掘,分别得到词嵌入候选扩展词集和挖掘候选扩展词集,将这两种候选扩展词集进行交集融合得到最终扩展词集,实现查询扩展。【结果】实验结果表明,所提扩展模型检索结果MAP和P@5高于基准检索,与近年同类查询扩展方法比较,其MAP和P@5平均增幅范围分别为0.96%~31.24%和1.07%~13.55%。【局限】只进行实验性研究,需要继续探讨在实际信息检索系统中的具体应用。【结论】所提模型能提高扩展词质量,改善检索性能,遏制查询主题漂移和词不匹配问题。[Objective]This paper proposes a query expansion model based on the intersection of word embedding and expansion terms,aiming to reduce the mismatched words in information retrieval.[Methods]First,we trained the word embedding learning with the retrieved documents to obtain the Word Embedding Candidate Expansion Term set.Then,we examined the association rules and generated the Mining Candidate Expansion Term set.Finally,we created the final expansion term set by merging the previous two sets and expanded the queries.[Results]The MAP and P@5 of the proposed model were higher than those of the benchmark ones.Compared with the similar query expansion methods developed in recent years,the average increase of the MAP and P@5 were 0.96%-31.24%and 1.07%-13.55%,respectively.[Limitations]The proposed model needs to be examined with real world information retrieval systems.[Conclusions]The proposed model can improve the quality of expansion terms and the performance of information retrieval systems,which also reduces query topic drifting and word mismatch issues.

关 键 词:信息检索 查询扩展 文本挖掘 深度学习 词嵌入 

分 类 号:TP393[自动化与计算机技术—计算机应用技术] G350[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象