检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:苏晓慧[1] 张晓东[2] 胡春蕾 邹再超 邱晓康 SU Xiao-hui;ZHANG Xiao-dong;HU Chun-lei;ZOU Zai-chao;QIU Xiao-kang(School of Information Science and Technology,Beijing Forestry University,Beijing 100083;College of Information and Electrical Engineering,China Agricultural University,Beijing 100083,China)
机构地区:[1]北京林业大学信息学院,北京100083 [2]中国农业大学信息与电气工程学院,北京100083
出 处:《地理与地理信息科学》2018年第4期90-95,共6页Geography and Geo-Information Science
基 金:国家重点研发计划项目(2016YFB0502502);中央高校基本科研业务费专项资金项目(BLX2013034)
摘 要:随着网络通讯技术的发展和社交媒体工具的普及,越来越多的公众在微博平台发布、传播地震相关信息,而如何从这些信息中获取有用信息并为开展地震应急工作提供方向性的指导,成为研究的重点及难点。该文提出一种改进的TF-PDF算法,通过发布微博的博主影响力以及微博的关注度确定地震主题特征项的权重。首先利用ICTCLAS分词系统对地震微博信息进行分词,然后在微博分词后的词库中依据权重对候选主题词进行排序,从而获得地震信息的热门主题词,并以芦山地震和云南彝良地震的微博信息为例,对传统TF-PDF算法和改进后的TFPDF算法进行了对比。结果表明,利用传统TF-PDF方法发现的地震热门主题词多为位置信息,而改进后的方法可以更有效地发现公众在震时的感受,可为灾害救援提供及时的信息与支持。More and more public people publish and disseminate earthquake information on microblog platform along with the development of network communication technology and the popularity of social media tools.To provide directional guidance for earthquake emergency work,an improved TF-PDF algorithm is proposed in this paper and it could be used to find out hot topic-words about earthquake from the microblog messages.In the improved TF-PDF algorithm,the weight of each characteristic item is calculated based on the bloggers′influence.The influence is estimated by the microblog author′s influence and public′s attention of microblog messages.Firstly,word segmentation was taken on the microblog data based on ICTCLAS system.Then,the microblog′s influence was calculated by microblog author′s effects and attentions of microblog.Word frequency of each characteristic item was estimated according to the influence.Finally,all characteristic items were ranked by their word frequencies and the top items were taken as hot topic-words for earthquake.Lushan Earthquake and Yiliang Earthquake were taken as the application examples for original TF-PDF algorithm and improved TF-PDF algorithm.The result shows that the improved algorithm is more useful to earthquake relief operations.Hot topic-words about earthquake contain location information which are from microblog messages.Spatial distribution about huge amounts of microblog were also researched in the paper.Thus,study on extracting hot topic-words about earthquake from amount microblog messages can give not only topics which public focus on but also the spatial distribution of the topics.
关 键 词:地震热门主题词 信息提取 微博抓取 微博影响力 TF-PDF
分 类 号:P208[天文地球—地图制图学与地理信息工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.143.221.185