领域Ontology的自动丰富——基于ADL地名表的实例研究  被引量:5

Automatic Enriclnnent of Domain Ontology——A Case Study on ADL Gazetteer

在线阅读下载全文

作  者:葛宁[1] 王军[1] 

机构地区:[1]北京大学信患管理系,北京100871

出  处:《计算机科学》2007年第9期156-162,共7页Computer Science

基  金:国家自然科学基金项目(70303002)

摘  要:本文以一个地理特征词表(Feature Type Thesaurus,FTT)为研究实例,提出了一种对领域Ontology进行自动丰富的方法。FTT描述了200多种地理特征类型,依照等级结构组织,用于标引和组织美国亚历山大数字图书馆地名表(ADL Gazetteer)中的6百万个地名。为了对FTT进行自动丰富,(1)首先从地名中抽取和发现有检索价值的、表示地理特征类型的通用词;(2)根据它们和标引主题词间的同现关系,在相同词族词汇的聚类过程中,确定与之相对应的主题词,进而将提取出的通用词定位到FTT的等级结构中。充分利用已经存在的大量标引语料,实现通用词的定位分析是核心内容,并且实验结果证明有效性达到82.7%。这项研究的实质是从Ontology标引的语料库中自动提取领域知识和标引知识,达到对Ontology的自动丰富。这一方法可以应用到类似的语料库和知识库上,实现新术语的发现、Ontology自丰富及其互操作。The utility of domain ontologies has been increasing in recent years. A critical issue for their wider applications is the automatic enrichment of domain ontology, i.e. to enrich them with new terminologies and relationships to reflect the ever advancing of domain knowledge. Based on the large scale experiment on ADL Gazetteer, one of the biggest digital gazetteers over the world, a solution for this problem is proposed. ADL Gazetteer is a digitalized worldwide gazetteer developed in the Alexandria Digital Library (ADL)Project, which contains millions of geographic names (place names). The place names are indexed with type terms from the ADL Feature Type Thesaurus (FTT), a hierarchical category scheme, which is the ontology study case in this research. To discover generic terms from place names and to set up correlation between the extracted generic terms and the corresponding type terms in the FTT, a pointwise-mutual-information motivated method is used to extract frequent words/phrases from the place names, and a hierarchical clustering algorithm is used to identify the generic terms from the extracted and to determine the concepts in the FTT to which the generic terms are correlated. The effectiveness of the experiment reached 82.7%. The proposed approach can be applied upon other similar ontology-indexed corpora, such as dictionaries and catalogs, and served as an assistant tool for end-user search, terminology discovery, ontology enrichment, and interoperation among different ontologies.

关 键 词:领域ONTOLOGY 自动丰富 词汇抽取 通用词 地名词典 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象