基于条件随机场挖掘文本史料中事件信息的方法与实证研究——以《拉贝日记》数字人文研究为例  被引量:3

A Methodological and Empirical Study of Extracting Event Information in Textual Historical Materials Based on Conditional Random Fields:Taking the Digital Humanities Study of the Rabe's Diaryas an Example

在线阅读下载全文

作  者:赵小萱 陈刚[1,2,3] 黄紫荆 Zhao Xiaoxuan;Chen Gang;Huang Zijing(Schoo f Geography and Ocean Seience,Nanjing University;Jiangsu Provincial Key Laboratory of Geographic Information Science and Technology;Key Laboratory for Land Satellite Remote Sensing Applications of Ministry of Natural Resources)

机构地区:[1]南京大学地理与海洋科学学院 [2]江苏省地理信息技术重点实验室 [3]自然资源部国土卫星遥感应用重点实验室

出  处:《图书馆杂志》2024年第3期101-108,115,共9页Library Journal

基  金:国家自然科学基金项目“基于近代地图的南京城市历史形态复原与景观变迁研究(1840—1937年)”(项目批准号:42071172);南京大学2021年“双创”项目“漫漫长夜中的人性之光:南京国际安全区故事地图(1937—1938)”的研究成果之一。

摘  要:文本史料被广泛数字化,如何从文本中提取地理命名实体及相关信息,有效开展地理信息挖掘成为重要研究课题。本文针对历史档案文档的特点,提出一种以地理命名实体为核心,使语义信息与地理位置关联,将文本描述的事件信息转化为各个地理命名实体的属性数据的事件抽取理念,提取出有关时间、地点、人物、事物、事件、现象等与地理命名实体相关的事件要素。研究以《拉贝日记》中收录的《日本士兵在南京安全区的暴行》档案为实证案例,采用条件随机场方法,抽取事件信息,结合历史地图等相关资料,将地理信息最终映射到地图上。本文方法有助于拓展文本资料在数字信息时代的开发利用方式,开辟文本挖掘分析与知识发现的新思路。Textual histories are widely digitized.How to extract geographically named entities and related information from the texts and how to effectively realize geographic information mining have become an important research topic.This paper proposes an idea of extracting event elements related to time,place,persons,things,events and phenomena associated with geographically named entities by taking the geographically named entities as the core and making the semantic information associated with geographical locations,and by converting the event information described in the text into the attribute data of each geographically named entity.The study used the document Japanese Soldiers'Atrocities in the Nanking Safety Zone included in Rabe's Diary as an empirical case,and used the conditional random field method to extract events.Combined with historical maps and other related data,geographical information is finally mapped to the map.The methodology of this paper expands the way textual information is exploited in the digital information era,opening up new ideas for text mining analysis and knowledge discovery.

关 键 词:条件随机场 特征模板 数字人文 信息提取 地理命名实体 

分 类 号:G250.7[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象