基于大语言模型的查询扩展方法研究  

Research on Query Extension Method Based on Large Language Model

作  者:王海涛[1] 师杨坤 WANG Hai-tao;SHI Yang-kun(School of Computer Science and Technology,Henan Polytechnic University,Jiaozuo 454003,China)

机构地区:[1]河南理工大学计算机科学与技术学院,河南焦作454003

出  处:《计算机技术与发展》2025年第3期148-155,共8页Computer Technology and Development

摘  要:检索增强生成(Retrieval Augmented Generation,RAG)技术能够很好地缓解传统大语言模型的幻觉问题以及在处理实时动态知识问题上的时效性问题,但已有的方法在检索的准确率和召回率方面仍有待提升。为了解决这一问题,提出了一种基于查询重写的方法Query2Query,旨在对查询语句进行更深层次的特征挖掘,从而提高用户输入文本与知识库文本的语义对齐度。该方法将大语言模型视为生成器,利用其生成能力将用户输入的原始查询根据预定义的提示词(prompt)进行改写,设计了一种TAO(Task-Action-Objective)提示词框架,从任务、行为及目标三个方面规范提示词的输出,并使用“What”“How”“Why”三个疑问词对用户原始查询进行结构化重写,扩展原始查询语义丰富度,使得重写后的查询可以覆盖更多潜在的相关信息,从而提升检索的准确率,最终将模型输出视为相关性文档,联合原始查询送入生成模型得到最终结果。在TERC DL’19和TERC DL’20数据集上对该框架进行评估,实验结果表明,该方法在检索任务中的准确率和召回率均有所提升。Retrieval Augmented Generation(RAG)has proven effective in mitigating issues of hallucinations in traditional large language models(LLMs)and addressing challenges related to real-time knowledge processing.However,existing methods still face limitations in terms of retrieval precision and recall.To address these limitations,we propose a novel query-rewriting approach,Query2Query,aimed at deeper feature extraction from query statements to enhance semantic alignment between user inputs and knowledge base content.This approach conceptualizes LLMs as generative agents,utilizing their generative capacity to rewrite users'original queries based on predefined prompts.Specifically,we introduce the TAO(Task-Action-Objective)prompting framework,which structures prompts along the dimensions of task,action,and objective.Furthermore,we leverage the"What""How"and"Why"interrogatives to perform a structured rewrite of users'original queries,enriching the semantic depth of the query and covering a broader range of potentially relevant information.This enriched rewriting process significantly enhances retrieval accuracy.The final model output is treated as relevance-weighted documents,which combined with the original query,is fed into a generation model to produce the final output.Evaluations on the TERC DL’19 and TERC DL’20 datasets demonstrate that this framework improves both precision and recall in retrieval tasks.

关 键 词:检索增强生成 大语言模型 查询扩展 特征提取 提示词 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象