检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《计算机科学》2007年第9期156-162,共7页Computer Science
基 金:国家自然科学基金项目(70303002)
摘 要:本文以一个地理特征词表(Feature Type Thesaurus,FTT)为研究实例,提出了一种对领域Ontology进行自动丰富的方法。FTT描述了200多种地理特征类型,依照等级结构组织,用于标引和组织美国亚历山大数字图书馆地名表(ADL Gazetteer)中的6百万个地名。为了对FTT进行自动丰富,(1)首先从地名中抽取和发现有检索价值的、表示地理特征类型的通用词;(2)根据它们和标引主题词间的同现关系,在相同词族词汇的聚类过程中,确定与之相对应的主题词,进而将提取出的通用词定位到FTT的等级结构中。充分利用已经存在的大量标引语料,实现通用词的定位分析是核心内容,并且实验结果证明有效性达到82.7%。这项研究的实质是从Ontology标引的语料库中自动提取领域知识和标引知识,达到对Ontology的自动丰富。这一方法可以应用到类似的语料库和知识库上,实现新术语的发现、Ontology自丰富及其互操作。The utility of domain ontologies has been increasing in recent years. A critical issue for their wider applications is the automatic enrichment of domain ontology, i.e. to enrich them with new terminologies and relationships to reflect the ever advancing of domain knowledge. Based on the large scale experiment on ADL Gazetteer, one of the biggest digital gazetteers over the world, a solution for this problem is proposed. ADL Gazetteer is a digitalized worldwide gazetteer developed in the Alexandria Digital Library (ADL)Project, which contains millions of geographic names (place names). The place names are indexed with type terms from the ADL Feature Type Thesaurus (FTT), a hierarchical category scheme, which is the ontology study case in this research. To discover generic terms from place names and to set up correlation between the extracted generic terms and the corresponding type terms in the FTT, a pointwise-mutual-information motivated method is used to extract frequent words/phrases from the place names, and a hierarchical clustering algorithm is used to identify the generic terms from the extracted and to determine the concepts in the FTT to which the generic terms are correlated. The effectiveness of the experiment reached 82.7%. The proposed approach can be applied upon other similar ontology-indexed corpora, such as dictionaries and catalogs, and served as an assistant tool for end-user search, terminology discovery, ontology enrichment, and interoperation among different ontologies.
关 键 词:领域ONTOLOGY 自动丰富 词汇抽取 通用词 地名词典
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117