检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:林森 刘蓓蓓 李建文 刘旭 秦昆[2] 郭桂祯 LIN Sen;LIU Beibei;LI Jianwen;LIU Xu;QIN Kun;GUO Guizhen(National Disaster Reduction Center of the Emergency Management Department,Beijing 100124,China;School of Remote Sensing and Information Engineering,Wuhan University,Wuhan 430079,China)
机构地区:[1]应急管理部国家减灾中心,北京100124 [2]武汉大学遥感信息工程学院,湖北武汉430079
出 处:《武汉大学学报(信息科学版)》2024年第9期1661-1671,共11页Geomatics and Information Science of Wuhan University
基 金:国家重点研发计划(2018YFC1508806)。
摘 要:社交媒体数据具有现势性高、传播快、信息丰富、成本低、数据量大等优点,已经成为分析突发灾害事件的重要信息源,但社交媒体数据也存在质量各异、冗余而又不完整、覆盖不均匀、缺少统一规范、隐私与安全难以控制等缺点。为了利用社交媒体数据为灾害应急响应提供精准化依据,需要能够甄别社交媒体内容并进行有效分类的先进技术。利用基于变换器的双向编码表征进行迁移学习,建立文本分类模型,对地震灾害事件后“黄金”72 h内的微博数据进行多标签分类,面向应急需求将标签划分为致灾信息、损失信息、救援救助信息、舆情信息、无用信息5种类型,从而定向挖掘可用于灾情分析的精细化专题信息。所提模型在训练集和测试集上的分类准确率分别达97%和92%,有效提升了微博文本数据的分类精度。评估结果表明,所提模型能够较好地分类社交媒体中地震灾害标签信息,可应用于地震灾害事件的快速灾情研判,这种社交媒体灾情信息获取方法弥补了传统灾害信息获取手段的滞后性。Objectives With the rapid development of the Internet,social media has become an important information source of emergency events.However,there are a lot of duplication,errors and even malicious contents in social media,which need to be effectively classified to provide more accurate information for disaster emergency response.Methods Deep learning has greatly improved the accuracy and efficiency of text classification.This paper takes earthquake disaster as an example,and builds a multi-label classification model based on bidirectional encoder representation from transformers(BERT)transfer learning.Over 50000 posts about 5 earthquakes are collected as training samples from SINA Weibo,which is a very popular social media in China.Each sample is manually marked as one or more labels,such as hazards information,loss information,rescue information,public opinion information and useless information.Results By fine-tune training,the classification accuracies of the proposed model on training dataset and test dataset reach 97%and 92%,respectively.The area under curve score of each label ranges from 0.952 to 0.998.Conclusions The results prove that the multi-label classification using BERT transfer learning is of high reliability.The proposed model can be applied to the emergency management services for earthquake events,which is beneficial for the rapid disaster rescue and relief.
关 键 词:BERT 预训练模型 迁移学习 社交媒体 地震灾害 灾害应急响应 多标签分类
分 类 号:P237[天文地球—摄影测量与遥感]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.13