融合大语言模型和提示学习的数字孪生水利知识图谱构建  

Construction of digital twin water conservancy knowledge graph integrating large language model and prompt learning

在线阅读下载全文

作  者:杨燕[1] 叶枫[1,2] 许栋 张雪洁[1] 徐津[2,3,4] YANG Yan;YE Feng;XU Dong;ZHANG Xuejie;XU Jin(College of Computer Science and Software Engineering,Hohai University,Nanjing Jiangsu 211100,China;Key Laboratory of Hydrologic-Cycle and Hydrodynamic-System of Ministry of Water Resources(Hohai University),Nanjing Jiangsu 210024,China;College of Water Conservancy and Hydropower Engineering,Hohai University,Nanjing Jiangsu 210098,China;The National Key Laboratory of Water Disaster Prevention(Hohai University),Nanjing Jiangsu 210098,China)

机构地区:[1]河海大学计算机与软件学院,南京211100 [2]水利部水循环与水动力系统重点实验室(河海大学),南京210024 [3]河海大学水利水电学院,南京210098 [4]水灾害防御全国重点实验室(河海大学),南京210098

出  处:《计算机应用》2025年第3期785-793,共9页journal of Computer Applications

基  金:国家重点研发计划项目(2022YFC3202600);水利部重大科技项目(SKS-2022139)。

摘  要:构建数字孪生水利建设知识图谱挖掘水利建设对象之间的潜在关系能够帮助相关人员优化水利建设设计方案和决策。针对数字孪生水利建设的学科交叉和知识结构复杂的特性,以及通用知识抽取模型缺乏对水利领域知识的学习和知识抽取精度不足等问题,为提高知识抽取的精度,提出一种基于大语言模型的数字孪生水利建设知识抽取方法(DTKE-LLM)。该方法通过LangChain部署本地大语言模型(LLM)并集成数字孪生水利领域知识,基于提示学习微调LLM,LLM利用语义理解和生成能力抽取知识,同时,设计异源实体对齐策略优化实体抽取结果。在水利领域语料库上进行对比实验和消融实验,以验证所提方法的有效性。对比实验结果表明,相较于基于深度学习的双向长短期记忆条件随机场(BiLSTM-CRF)命名实体识别模型和通用信息抽取模型UIE(Universal Information Extraction),DTKE-LLM的精确率更优;消融实验结果表明,相较于ChatGLM2-6B(Chat Generative Language Model 2.6 Billion),DTKE-LLM的实体抽取和关系抽取F1值分别提高了5.5和3.2个百分点。可见,该方法在保障知识图谱构建质量的基础上,实现了数字孪生水利建设知识图谱的构建。Constructing digital twin water conservancy construction knowledge graph to mine the potential relationships between water conservancy construction objects can help the relevant personnel to optimize the water conservancy construction design scheme and decision-making process.Aiming at the interdisciplinary and complex knowledge structure of digital twin water conservancy construction,and the problems such as insufficient learning and low extraction accuracy of knowledge of general knowledge extraction models in water conservancy domain,a Digital Twin water conservancy construction Knowledge Extraction method based on Large Language Model(DTKE-LLM)was proposed to improve the accuracy of knowledge extraction.In this method,by deploying local Large Language Model(LLM)through LangChain and integrating digital twin water conservancy domain knowledge,prompt learning was used to fine-tune the LLM.In the LLM,semantic understanding and generation capabilities were utilized to extract knowledge.At the same time,a heterogeneous entity alignment strategy was designed to optimize the entity extraction results.Comparison experiments and ablation experiments were carried out on the water conservancy domain corpus to verify the effectiveness of DTKE-LLM.Results of the comparison experiments demonstrate that DTKE-LLM outperforms the deep learning-based BiLSTM-CRF(Bidirectional Long Short-Term Memory Conditional Random Field)named entity recognition model and the general Information extraction model UIE(Universal Information Extraction)in precision.Results of the ablation experiments show that compared with the ChatGLM2-6B(Chat Generative Language Model 2.6 Billion),DTKE-LLM has the F1 scores of entity extraction and relation extraction improved by 5.5 and 3.2 percentage points respectively.It can be seen that the proposed method realizes the construction of digital twin water conservancy construction knowledge graph on the basis of ensuring the quality of knowledge graph construction.

关 键 词:大语言模型 提示学习 知识图谱 知识抽取 数字孪生水利建设 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象