检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:解勉 陈刚 余晓晗 XIE Mian;CHEN Gang;YU Xiao-han(School of Command and Control Engineering,Army Engineering University of PLA,Nanjing 210007,China)
机构地区:[1]中国人民解放军陆军工程大学指挥控制工程学院,江苏南京210007
出 处:《计算机技术与发展》2024年第12期116-124,共9页Computer Technology and Development
摘 要:在现代学术研究中,高效准确地检索相关学术论文是至关重要的一环。传统的检索方法通常依赖于精确的关键词输入,要求用户具备一定程度的专业知识以选择和使用恰当的术语。针对这一问题,探索一种利用大语言模型(Large Language Models, LLMs)基于内容对论文进行检索与分析的方法,旨在降低检索词专业性带来的论文检索门槛,同时可以对论文内容进行一定的分析。首先,提出了基于内容的论文检索与分析设计框架,以论文解析和向量数据库为基础分别针对单篇论文、多篇论文以及较模糊的通俗描述进行检索与分析;其次,设计了论文解析方法,以及用于提取论文主要内容的大语言模型提示词,引导大语言模型更关注论文具有代表性的关键信息,从而提高检索性能,并通过对比分析获得了更有效提取信息的提示词;最后,通过对比实验证明了该方法的可行性与有效性,根据论文全文以及较模糊的通俗描述进行检索,mAP分别达98.47%和99.51%。Efficient and accurate retrieval of relevant academic papers is crucial in modern academic research.Traditional retrieval methods often rely on precise keyword input,requiring users to have a certain level of professional knowledge to choose and use appropriate terminology.To address this issue,we explore a method of using Large Language Models for content-based retrieval and analysis of papers,aiming to reduce the retrieval threshold caused by the professionalism of search terms,while also allowing for certain analysis of paper content.Firstly,a content based paper retrieval and analysis design framework was proposed,which is based on paper parsing and vector databases for searching and analyzing single papers,multiple papers,and vague popular descriptions.Secondly,a paper parsing method was designed,as well as a large language model prompt word for extracting the main content of the paper,guiding the large language model to pay more attention to the representative key information of the paper,thereby improving retrieval performance.And through comparative analysis,more effective prompt words for extracting information were obtained.Finally,the feasibility and effectiveness of the proposed method were demonstrated through experiments.Based on the full text of the paper and vague popular descriptions,the mAP for retrieval reached 98.47%and 99.51%,respectively.
关 键 词:文档检索 文档分析 大语言模型 提示词工程 学术论文
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.224.184.41