Astronomical Knowledge Entity Extraction in Astrophysics Journal Articles via Large Language Models  被引量:2

在线阅读下载全文

作  者:Wujun Shao Rui Zhang Pengli Ji Dongwei Fan Yaohua Hu Xiaoran Yan Chenzhou Cui Yihan Tao Linying Mi Lang Chen 

机构地区:[1]National Astronomical Observatories,Chinese Academy of Sciences,Beijing 100101,China [2]University of Chinese Academy of Sciences,Beijing 100049,China [3]National Astronomical Data Center,Beijing 100101,China [4]Research Institute of Artificial Intelligence,Zhejiang Lab,Hangzhou 311100,China [5]Guilin University,Guangxi 541006,China [6]Xidian University,Xi’an 710126,China

出  处:《Research in Astronomy and Astrophysics》2024年第6期140-155,共16页天文和天体物理学研究(英文版)

基  金:supported by the National Natural Science Foundation of China(NSFC,Grant Nos.12273077,72101068,12373110,and 12103070);National Key Research and Development Program of China under grants(2022YFF0712400,2022YFF0711500);the 14th Five-year Informatization Plan of Chinese Academy of Sciences(CAS-WX2021SF-0204);supported by Astronomical Big Data Joint Research Center;co-founded by National Astronomical Observatories,Chinese Academy of Sciences and Alibaba Cloud。

摘  要:Astronomical knowledge entities,such as celestial object identifiers,are crucial for literature retrieval and knowledge graph construction,and other research and applications in the field of astronomy.Traditional methods of extracting knowledge entities from texts face numerous challenging obstacles that are difficult to overcome.Consequently,there is a pressing need for improved methods to efficiently extract them.This study explores the potential of pre-trained Large Language Models(LLMs)to perform astronomical knowledge entity extraction(KEE)task from astrophysical journal articles using prompts.We propose a prompting strategy called PromptKEE,which includes five prompt elements,and design eight combination prompts based on them.We select four representative LLMs(Llama-2-70B,GPT-3.5,GPT-4,and Claude 2)and attempt to extract the most typical astronomical knowledge entities,celestial object identifiers and telescope names,from astronomical journal articles using these eight combination prompts.To accommodate their token limitations,we construct two data sets:the full texts and paragraph collections of 30 articles.Leveraging the eight prompts,we test on full texts with GPT-4and Claude 2,on paragraph collections with all LLMs.The experimental results demonstrate that pre-trained LLMs show significant potential in performing KEE tasks,but their performance varies on the two data sets.Furthermore,we analyze some important factors that influence the performance of LLMs in entity extraction and provide insights for future KEE tasks in astrophysical articles using LLMs.Finally,compared to other methods of KEE,LLMs exhibit strong competitiveness in multiple aspects.

关 键 词:astronomical databases:miscellaneous virtual observatory tools methods:data analysis 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] P14[自动化与计算机技术—控制科学与工程] G237.5[天文地球—天体物理]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象