基于知识融合和聚类引导的语言模型用于MOFs合成信息分类  

Language model based on knowledge fusion and cluster guidance for MOFs synthesis information classification

在线阅读下载全文

作  者:李海军 王卓[1,2] LI Haijun;WANG Zhuo(Shenyang Institute of Automation,Chinese Academy of Sciences,Shenyang 110016,China;Institutes for Robotics and Intelligent Manufacturing,Chinese Academy of Sciences,Shenyang 110169,China;University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区:[1]中国科学院沈阳自动化研究所,辽宁沈阳110016 [2]中国科学院机器人与智能制造创新研究院,辽宁沈阳110169 [3]中国科学院大学,北京100049

出  处:《现代电子技术》2024年第18期179-186,共8页Modern Electronics Technique

基  金:国家自然科学基金面上项目(62273337);2023辽宁省人工智能创新发展计划项目(2023JH26/10300014)。

摘  要:金属有机框架(MOFs)的合成实验步骤通常集中存储在科学文献某一段落内,从文献中提取实验步骤对推动新型金属有机框架的开发具有重要意义。现有研究存在两个问题:第一,将整篇文献视为普通文本,按句/段直接切分处理,忽略了上下文中隐藏的高级知识;第二,模型未深入挖掘数据内部的隐藏模式。针对上述问题,提出一个基于知识融合策略的高质量知识补充任务。利用科学文献编辑风格和结构化Web数据的微妙之处,将上下文关键知识汇集到段落中,进而优化其文本表征;然后提出一种基于聚类引导的自适应分类算法,采用聚类算法将文本表征划分为多个簇,同时训练不同的分类模型来适应不同的簇,从而有效地减少数据重叠的影响,提高模型召回率。实验结果表明,所提方法的性能相比主流基线模型有较大提升。The experimental steps for the synthesis of metal-organic frameworks(MOFs)are usually stored in a certain section of scientific literature.It is of great significance to extract the experimental steps from the literature to promote the development of new MOFs.There are two problems in the existing research:first,the whole literature is regarded as ordinary text,and the sentence/paragraph is directly segmented,ignoring the advanced knowledge hidden in the context;second,the model does not dig deeply into the hidden patterns within the data.On this basis,a high-quality knowledge supplementation tasks based on knowledge fusion strategy is proposed.The subtleties of literature editing styles and interactive web data are used to bring together context-critical knowledge into paragraphs,so as to optimize their textual representation.An adaptive classification algorithm based on clustering guidance is proposed.The clustering algorithm is used to divide text representation into multiple clusters,while training different classification model to adapt to different clusters,effectively reducing the impact of data overlap and improving model recall.The experimental results show that in comparison with the mainstream baseline models,the proposed method has great performance improvement.

关 键 词:金属有机框架 科学文献 知识融合 文本表征 聚类引导 自适应分类 数据重叠 

分 类 号:TN919.65-34[电子电信—通信与信息系统] TP391[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象