机构地区:[1]中国科学院地理科学与资源研究所资源与环境信息系统国家重点实验室,北京100101 [2]中国科学院大学,北京100049 [3]北京理工大学计算机学院,北京100081 [4]中国科学院地理科学与资源研究所区域可持续发展分析与模拟重点实验室,北京100101
出 处:《地球信息科学学报》2022年第12期2342-2355,共14页Journal of Geo-information Science
基 金:中国科学院战略性先导科技专项(XDA23100103)。
摘 要:随着气候变暖加剧,全球极端天气事件频发,重大气象灾害的发生频率与日俱增。研究气候变化与气象灾害发生频率的关系,对于气候变化背景下的防灾减灾具有重要意义。文献资料及泛在网络数据中蕴含了海量的气象灾害时空事件,为此,本文基于自然语言处理技术研发了文本气象灾害时空事件自动抽取方法。(1)提出了基于专业文献的由粗到精的气象灾害标注语料训练库构建方法。首先针对不同文献资料存在的歧义和不兼容等问题,构建了面向文本事件统一的气象灾害知识体系。然后构建了基于章节结构的粗标注方法,分别针对长文本(现代文)和短文本(文言文)研发了基于Labeled LDA模型及基于TF-IDF和N-gram模型的精细标注语料筛选方法,解决了语料库的快速构建问题;(2)基于BERT-CNN模型研发了融合上下文语义特征和多粒度的局部语义特征的、面向长短文本一体化处理的气象灾害时空事件自动分类方法;(3)利用该方法分别从文言文和泛在网络数据中自动抽取了灾害时空事件,其宏F1值分别达到89.09%和80.06%,主要气象灾害时空事件分布与专业统计数据相关性较高;(4)基于以上结果,重建了我国各历史时期灾害时空演变过程,发现各时期灾害数据量整体呈现出逐步上升趋势,暴雨灾害、洪涝灾害与干旱灾害是影响我国的主要灾种。本方法既可实现网络长文本事件的自动发现,也可实现文言文短文本事件的自动检测,为文本数据便捷应用于气象灾害研究和监测提供了新的技术方法。With global warming, the frequency of extreme weather events and major meteorological disasters is increasing globally. It is important to study the relationship between climate change and the frequency of meteorological disasters for disaster prevention and mitigation in the context of climate change. In this paper, a method is proposed for automatic extraction of spatial and temporal events of meteorological disasters based on natural language processing technology. Because there is a huge amount of spatial and temporal information of meteorological disasters available in literature and web data. Specifically,(1) A coarse-to-fine method was proposed to build a training corpus of meteorological disaster annotations based on professional literature.Firstly, a unified meteorological disaster knowledge system oriented to textual events is constructed to address the problems of ambiguity and incompatibility of different literature materials. Then a coarse annotation method based on chapter structure was constructed, and a Labeled LDA model-based and a fine-grained annotated corpus screening method based on TF-IDF and N-gram models were developed for long texts(modern texts) and short texts(literary texts), respectively, solving the problem of rapid corpus construction;(2) A method for automatic classification of spatiotemporal events of meteorological disasters based on the BERT-CNN model,which integrates contextual semantic features and local semantic features at multiple granularities, was developed for the integrated processing of short and long texts;(3) Using this method, the spatiotemporal events of meteorological disasters were automatically extracted from the textual and web data, and their macro F1values reached 89.09% and 80.06%, respectively. The spatiotemporal distributions of major events of meteorological disasters were highly correlated with professional statistics;(4) Based on the above results, the spatiotemporal evolution of disasters in various historical periods in China was also reconstructed. W
关 键 词:气象灾害 时空事件 知识体系 语料库 文本分类 BERT-CNN模型 事件抽取
分 类 号:P429[天文地球—大气科学及气象学] TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...