检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:姚汝婧 王芳[1,2] Yao Rujing;Wang Fang(Department of Information Resources Management,Business School,Nankai University,Tianjin 300071,China;Center for Network Society Governance,Nankai University,Tianjin 300071,China)
机构地区:[1]南开大学商学院信息资源管理系,天津300071 [2]南开大学网络社会治理研究中心,天津300071
出 处:《现代情报》2025年第4期3-11,73,共10页Journal of Modern Information
基 金:国家社会科学基金重大项目“基于数据共享与知识复用的数字政府智能化治理研究”(项目编号:20ZDA039)。
摘 要:[目的/意义]理论是情报学学科构建与发展中至关重要的组成部分,对理论的梳理与分析不仅有助于理解情报学学科的起源与发展脉络,也能够预测新兴技术的发展,高效和准确地识别理论实体对于促进理论研究的深化具有极为重要的作用。[方法/过程]本文提出了一种大小模型协同的情报学理论实体抽取算法,包括词嵌入向量增强、样本难度评估和理论识别模型3个模块。首先利用大型语言模型对理论实体进行预识别,预识别的实体与句子中的原始词嵌入向量构成增强词嵌入向量,通过增强的词嵌入向量优化领域小模型的训练过程。此外,本文利用大模型对样本的难度进行评估,并据此调整训练策略,以提高模型性能。该算法充分结合大型语言模型强大的语义理解能力以及领域小模型的专业性。[结果/结论]在情报学理论实体抽取数据集上展开实验,结果表明本文提出的算法有效提升了理论实体抽取的性能,在精确率、召回率、F1指标上均实现了最优结果。[Purpose/Significance]Theory is an essential component in the construction and development of the discipline of information science.Organization and Analysis of theories not only help understand the origins and developmental trajectories of the discipline but also predict the development of emerging technologies.Efficient and accurate identification of theoretical entities plays a crucial role in deepening theoretical research.[Method/Process]This paper proposed an information science theory extraction algorithm that collaborates between large and small language models,including modules for enhanced word embedding vectors,sample difficulty assessment,and a theoretical identification model.Initially,the paper used large language models to pre-identify theoretical entities.These pre-identified entities,combined with the original word embeddings,formed the enhanced word embeddings.The training process of domain-specific small models was optimized through these enhanced word embedding vectors.Additionally,the paper used large language models to assess the difficulty of samples and adjusts training strategies accordingly to improve model performance.The proposed algorithm fully integrated the large language models'powerful semantic understanding capabilities and the professionalism of domain-specific small models.[Result/Conclusion]Experiments conducted on a dataset for the extraction of theoretical entities in information science show that the algorithm proposed in this paper effectively improves the performance of theoretical entity extraction,achieving the best results in the metrics of precision,recall,and F1 score.
关 键 词:大型语言模型 情报学理论 实体识别 样本学习难度 模型协同
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.117.80.241