基于大语言模型全流程微调的叙词表等级关系构建研究  

Research on the Construction of Hierarchical Relationships in Thesaurus Based on the Full-Process Fine-tuning of Large Language Model

在线阅读下载全文

作  者:李泽宇 刘伟[1] Li Zeyu;Liu Wei(Institute of Scientific and Technical Information of China,Beijing 100038)

机构地区:[1]中国科学技术信息研究所,北京100038

出  处:《情报理论与实践》2025年第4期152-162,共11页Information Studies:Theory & Application

摘  要:[目的/意义]随着知识组织系统运作环境的变化,知识组织的重要性不断提升,为突破传统叙词表构建及应用困境,结合最新大语言模型技术探索叙词表构建新范式。[方法/过程]从叙词表自身特征及其构建思路入手,采用继续预训练、监督微调和强化学习的全流程微调,结合本地知识库的方案,对大语言模型进行微调训练,并基于“量子科技”领域和“理论力学”领域进行实证。[结果/结论]实证发现,经过继续预训练、“多策略数据处理微调方案”和RLHF的微调方案表现更优。其中,对于“理论力学”领域的已有词表等级关系构建准确度高达89.06%,“量子科技”新兴领域词表等级关系构建准确度为63.02%。这表明,本方案可以实现已有词表等级关系的构建,且在新领域词表等级关系的构建中表现良好,具备一定可行性,能为新领域叙词表构建提供参考。[Purpose/significance]With the change in the operating environment of knowledge organization systems,the importance of knowledge organization continues to increase.In order to break through the traditional thesaurus construction and application dilemmas,this paper explores a new paradigm for thesaurus construction by integrating the latest large language model technology.[Method/process]Starting with the characteristics of the thesaurus itself and its construction approach,the study adopts a full-process fine-tuning strategy that includes continued pre-training,supervised fine-tuning,and reinforcement learning combined with a local knowledge base to fine-tune the large language model.Empirical studies are conducted in the fields of“Quantum Technology”and“Theoretical Mechanics”.[Result/conclusion]The empirical findings show that the fine-tuning scheme with continued pre-training,“multi-strategy data processing fine-tuning”,and Reinforcement Learning from Human Feedback(RLHF)performs better.Specifically,the accuracy of constructing hierarchical relationships in the existing thesaurus for the“Theoretical Mechanics”field reaches 89.06%,while for the emerging field of“Quantum Technology”,the accuracy is 63.02%.This indicates that the proposed scheme can effectively construct hierarchical relationships in existing thesauri and performs well in the construction of hierarchical relationships for thesauri in new fields,demonstrating its feasibility and providing a reference for the construction of thesauri in emerging areas.

关 键 词:叙词表 大语言模型 等级关系 知识组织系统 语义关系 

分 类 号:G63[文化科学—教育学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象