检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Xiaobo Zhang Hao Li Qiang Liu Zhenhua Li Claire E.Reymond Min Zhang Yuangeng Huang Hongfei Chen Zhong-Qiang Chen
机构地区:[1]School of Computer,China University of Geosciences,Wuhan 430078,China [2]Hubei Key Laboratory of Intelligent Geo-Information Processing,China University of Geosciences,Wuhan 430074,China [3]School of Earth Resources,China University of Geosciences,Wuhan 430074,China [4]State Key Laboratory of Biogeology and Environmental Geology,China University of Geosciences,Wuhan 430078,China
出 处:《Journal of Earth Science》2023年第5期1358-1373,共16页地球科学学刊(英文版)
基 金:supported by three grants from the National Natural Science Foundation of China (Nos.41821001,41902315,41930322)。
摘 要:Within any scientific disciplines, a large amount of data are buried within various literature depositories and archives, making it difficult to manually extract useful information from the datum swamps. The machine-learning extraction of data therefore is necessary for the big-data-based studies. Here, we develop a new text-mining technique to reconstruct the global database of the Precambrian to Recent stromatolites, providing better understanding of secular changes of stromatolites though geological time. The step-by-step data extraction process is described as below. First, the PDF documents of stromatolite-containing literatures were collected, and converted into text formation. Second, a glossary and tag-labeling system using NLP(Natural Language Processing) software was employed to search for all possible candidate pairs from each sentence within the papers collected here. Third, each candidate pair and features were represented as a factor graph model using a series of heuristic procedures to score the weights of each pair feature. Occurrence data of stromatolites versus stratigraphical units(abbreviated as Strata), facies types, locations, and age worldwide were extracted from literatures, respectively, and their extraction accuracies are 92%/464, 87%/778, 92%/846, and 93%/405 from 3 750 scientific abstracts, respectively, and are 90%/1 734, 86%/2 869, 90%/2 055 and 91%/857 from 11 932 papers, respectively. A total of 10 072 unique datum items were identified. The newly obtained stromatolite dataset demonstrates that their stratigraphical occurrences reached a pronounced peak during the Proterozoic(2 500 – 541 Ma), followed by a distinct fall during the Early Phanerozoic, and overall fluctuations through the Phanerozoic(541–0 Ma). Globally, seven stromatolite hotspots were identified from the new dataset, including western United States, eastern United States, western Europe, India, South Africa, northern China, and southern China. The proportional occurrences of inland aquatic stromatolites remain r
关 键 词:machine learning knowledge base construction STROMATOLITES PRECAMBRIAN knowledge graph
分 类 号:P628[天文地球—地质矿产勘探] P597.3[天文地球—地质学] TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.51