Literature classification and its applications in condensed matter physics and materials science by natural language processing

作　　者：吴思远朱天念涂思佳肖睿娟袁洁吴泉生李泓翁红明 Siyuan Wu;Tiannian Zhu;Sijia Tu;Ruijuan Xiao;Jie Yuan;Quansheng Wu;Hong Li;Hongming Weng(Institute of Physics,Chinese Academy of Sciences,Beijing 100190,China;School of Physical Sciences,University of Chinese Academy of Sciences,Beijing 100190,China;College of Materials Science and Optoelectronic Technology,University of Chinese Academy of Sciences,Beijing 100049,China;Condensed Matter Physics Data Center of Chinese Academy of Sciences,Beijing 100190,China)

机构地区：[1]Institute of Physics,Chinese Academy of Sciences,Beijing 100190,China [2]School of Physical Sciences,University of Chinese Academy of Sciences,Beijing 100190,China [3]College of Materials Science and Optoelectronic Technology,University of Chinese Academy of Sciences,Beijing 100049,China [4]Condensed Matter Physics Data Center of Chinese Academy of Sciences,Beijing 100190,China

出　　处：《Chinese Physics B》2024年第5期117-123,共7页中国物理B（英文版）

基　　金：funded by the Informatization Plan of Chinese Academy of Sciences(Grant No.CASWX2021SF-0102);the National Key R&D Program of China(Grant Nos.2022YFA1603903,2022YFA1403800,and 2021YFA0718700);the National Natural Science Foundation of China(Grant Nos.11925408,11921004,and 12188101);the Chinese Academy of Sciences(Grant No.XDB33000000)。

摘　　要：The exponential growth of literature is constraining researchers’access to comprehensive information in related fields.While natural language processing(NLP)may offer an effective solution to literature classification,it remains hindered by the lack of labelled dataset.In this article,we introduce a novel method for generating literature classification models through semi-supervised learning,which can generate labelled dataset iteratively with limited human input.We apply this method to train NLP models for classifying literatures related to several research directions,i.e.,battery,superconductor,topological material,and artificial intelligence(AI)in materials science.The trained NLP‘battery’model applied on a larger dataset different from the training and testing dataset can achieve F1 score of 0.738,which indicates the accuracy and reliability of this scheme.Furthermore,our approach demonstrates that even with insufficient data,the not-well-trained model in the first few cycles can identify the relationships among different research fields and facilitate the discovery and understanding of interdisciplinary directions.

关键词：natural language processing text mining materials science

分类号：O469[理学—凝聚态物理] TP391.1[理学—电子物理学] TB30[理学—物理]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Literature classification and its applications in condensed matter physics and materials science by natural language processing

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Literature classification and its applications in condensed matter physics and materials science by natural language processing

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索