A New Machine-Learning Extracting Approach to Construct a Knowledge Base: A Case Study on Global Stromatolites over Geological Time  

在线阅读下载全文

作  者:Xiaobo Zhang Hao Li Qiang Liu Zhenhua Li Claire E.Reymond Min Zhang Yuangeng Huang Hongfei Chen Zhong-Qiang Chen 

机构地区:[1]School of Computer,China University of Geosciences,Wuhan 430078,China [2]Hubei Key Laboratory of Intelligent Geo-Information Processing,China University of Geosciences,Wuhan 430074,China [3]School of Earth Resources,China University of Geosciences,Wuhan 430074,China [4]State Key Laboratory of Biogeology and Environmental Geology,China University of Geosciences,Wuhan 430078,China

出  处:《Journal of Earth Science》2023年第5期1358-1373,共16页地球科学学刊(英文版)

基  金:supported by three grants from the National Natural Science Foundation of China (Nos.41821001,41902315,41930322)。

摘  要:Within any scientific disciplines, a large amount of data are buried within various literature depositories and archives, making it difficult to manually extract useful information from the datum swamps. The machine-learning extraction of data therefore is necessary for the big-data-based studies. Here, we develop a new text-mining technique to reconstruct the global database of the Precambrian to Recent stromatolites, providing better understanding of secular changes of stromatolites though geological time. The step-by-step data extraction process is described as below. First, the PDF documents of stromatolite-containing literatures were collected, and converted into text formation. Second, a glossary and tag-labeling system using NLP(Natural Language Processing) software was employed to search for all possible candidate pairs from each sentence within the papers collected here. Third, each candidate pair and features were represented as a factor graph model using a series of heuristic procedures to score the weights of each pair feature. Occurrence data of stromatolites versus stratigraphical units(abbreviated as Strata), facies types, locations, and age worldwide were extracted from literatures, respectively, and their extraction accuracies are 92%/464, 87%/778, 92%/846, and 93%/405 from 3 750 scientific abstracts, respectively, and are 90%/1 734, 86%/2 869, 90%/2 055 and 91%/857 from 11 932 papers, respectively. A total of 10 072 unique datum items were identified. The newly obtained stromatolite dataset demonstrates that their stratigraphical occurrences reached a pronounced peak during the Proterozoic(2 500 – 541 Ma), followed by a distinct fall during the Early Phanerozoic, and overall fluctuations through the Phanerozoic(541–0 Ma). Globally, seven stromatolite hotspots were identified from the new dataset, including western United States, eastern United States, western Europe, India, South Africa, northern China, and southern China. The proportional occurrences of inland aquatic stromatolites remain r

关 键 词:machine learning knowledge base construction STROMATOLITES PRECAMBRIAN knowledge graph 

分 类 号:P628[天文地球—地质矿产勘探] P597.3[天文地球—地质学] TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象