高质量文本数据驱动的命名实体识别加速镍基单晶高温合金材料知识发现  被引量:1

Named Entity Recognition Driven by High-Quality Text Data Accelerates the Knowledge Discovery of Nickel-Based Single Crystal Superalloys

在线阅读下载全文

作  者:刘悦[1] 姚文轩 刘大晖 丁琳 杨正伟 刘微 于涛[3] 施思齐 LIU Yue;YAO Wenxuan;LIU Dahui;DING Lin;YANG Zhengwei;LIU Wei;YU Tao;SHI Siqi(School of Computer Engineering and Science,Shanghai University,Shanghai 200444,China;Materials Genome Institute,Shanghai University,Shanghai 200444,China;Division of Functional Materials,Central Iron and Steel Research Institute,Beijing 100081,China;School of Materials Science and Engineering,Shanghai University,Shanghai 200444,China)

机构地区:[1]上海大学计算机工程与科学学院,上海200444 [2]上海大学材料基因组工程研究院,上海200444 [3]钢铁研究总院功能材料研究所,北京100081 [4]上海大学材料科学与工程学院,上海200444

出  处:《金属学报》2024年第10期1429-1438,共10页Acta Metallurgica Sinica

基  金:国家自然科学基金项目Nos.52073169和92270124;国家重点研发计划项目No.2021YFB3802101。

摘  要:镍基单晶高温合金构效关系知识常常以非结构化文本的形式存储在海量公开发表的科学文献中。利用命名实体识别(NER)方法从非结构化文本中挖掘关键信息已成为助力新材料研发的重要方式。然而,已有NER方法依赖于大量语料数据支撑且不适用于处理跨领域任务,导致其难以适配镍基单晶高温合金领域。本工作提出基于语义特征融合的深度学习命名实体识别方法(SF-NER),以准确挖掘摘要文本中蕴含的镍基单晶高温合金知识。在领域知识指导下创建材料领域词典以实现远程监督,并建立了高质量镍基单晶高温合金标注语料库(含8类实体类型的19405个实体数据);为准确捕捉特定材料术语,提出了融合编码的词表征策略以捕获关键材料语义特征;构建双向长短期记忆网络-条件随机场(Bi-LSTM-CRF)模型捕捉句子序列中的关键语义信息以实现实体标签的精准预测。实验结果表明,SF-NER能够精准识别镍基单晶高温合金实体类别(评价指标F1值为0.84),有效筛选影响高温合金服役性能的关键因素,并推荐出可用于服役性能构效关系挖掘的高重要度描述符。The knowledge regarding the structure-activity relationships of nickel-based single crystal superalloys is mainly stored in the form of unstructured text in the vast published scientific literature,and its effective utilization can accelerate the design of high-performance materials.Named entity recognition(NER)methods can be used to extract vital information from unstructured text,thus contributing to automatically achieving tedious text mining tasks.However,existing NER methods typically rely on a large amount of corpus data,especially of the deep-learning-based type,and can hardly tackle cross-domain tasks.To the best of our knowledge,no prior research has been conducted for the knowledge discovery of nickel-based single crystal superalloys based on deep-learning-based NER;thus,it is difficult to adapt existing methods to this field.Here,a semantic-features-fused NER(SF-NER)method based on deep learning was proposed,aiming to accurately extract knowledge from abstract text concerning nickel-based single crystal superalloys.Specifically,as data quality determines the performance of NER models,a high-quality annotated corpus dataset for the above-mentioned superalloys(containing 19405 entity data of eight entity types)was constructed.This was created via remote supervision using domain-specific materials dictionary under the domain knowledge's guidance.To accurately capture the terms related to specific materials from the high-quality corpus dataset,a encoding fusion strategy for word representation was proposed for encoding the essential semantic features of materials from various perspectives.Then,based on these semantic features,a deep learning model,i.e.,bidirectional long short-term memory-cenditional random field(Bi-LSTM-CRF),was built to capture key semantic information in sentence sequences,thus accurately predicting entity types.The results of the experiment demonstrated that the proposed SF-NER method could accurately distinguish the entity categories of nickel-based single crystal superalloys(i.e.,F1=

关 键 词:数据质量 深度学习 命名实体识别 镍基单晶高温合金 领域知识 

分 类 号:TG131[一般工业技术—材料科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象