基于时空分析的线索性事件的抽取与集成系统研究  被引量:21

Research on Extraction and Integration of Developing Event Based on Analysis of Space-time Information

在线阅读下载全文

作  者:吴平博[1] 陈群秀[1] 马亮[1] 

机构地区:[1]智能技术与系统国家重点实验室清华大学计算机科学与技术系,北京100084

出  处:《中文信息学报》2006年第1期21-28,共8页Journal of Chinese Information Processing

基  金:国家863项目资助(2001AA114040)

摘  要:信息抽取技术能够提供高质量的检索服务。本文面向网络新闻事件,对人们感兴趣的事件关键信息进行了抽取和集成。系统中采用了如下的方法、策略:(1)利用句型模板构造抽取规则,然后直接从经过时间短语和空间短语识别和规范化处理的文本中抽取事件信息,从而跳过了深层句法分析,降低了实现系统的难度;(2)利用事件的规范化的时空信息关联不同文档中的同一事件,进行事件合并;(3)文档发生事件转移时对文档进行事件切分,从而解决了文档内不同事件信息的归并问题。初步实验结果表明:本文采用的方法和策略是有效的。Technology of information extraction (IE) can provide high-quality service for retrieval. Targeting at events in web news,this paper conducts a system that can extract and integrate key information of event that interests people. Methodologies and strategies of the system are as follows: (1) Extraction rules are built in tenus of sentence patterns, then event informarion is directly extracted from the text in which temporal phrases (TP) and space phrases (SP) are recognized and normalized . The extraction system can thus be easily implemented owing to skipping complex syntax parsing. (2) The same event in different documents is linked by normalized TP and SP of event, and the information associated with an event is merged. (3) When new event appears in a text, the text is segmented. So isolative information for an event in same segment can be merged into its owner. Preliminary experiments show that methodologies and strategies in this paper are feasible.

关 键 词:计算机应用 中文信息处理 信息抽取 句型模板 线索性事件 时空信息 事件合并 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象