一个基于本体主题的中文知识获取方法  被引量:5

An ontology-theme-based method of acquiring knowledge from Chinese natural language documents

在线阅读下载全文

作  者:车海燕[1] 孙吉贵[1] 荆涛[1] 白曦[1] 

机构地区:[1]吉林大学计算机科学与技术学院,长春130012

出  处:《计算机科学与探索》2007年第2期206-215,共10页Journal of Frontiers of Computer Science and Technology

基  金:the Key Project of National Natural Science Foundation of China under Grant No.60496321( 国家自然科学基金重大项目);the National Research Foundation for the Doctoral Program of Higher Education of China under Grant No.20050183065( 高等学校博士学科点专项科研基金).

摘  要:中文语言自身的特点决定了从中文自然语言文档中获取知识是非常困难的。尽管目前对中文的命名实体识别(简称为NER)已经取得了较好的效果,但是如果不借助同义词表或者类似WordNet的中文语言知识库,几乎无法正确地抽取已经识别出的实体之间的关系。文章提出了一个基于本体主题的思想进行中文知识获取的方法,该方法首次将主题思想引入领域本体,由领域专家对原始的领域本体中的概念和属性按照主题进行划分,建立起概念到主题、主题到属性的关联关系。在对一句话进行知识抽取时,通过简单的NER和直接与本体映射的方法可以识别出一句话中的部分概念、个体和属性,利用这些准确识别出的信息可以判定该句话所属的主题;该主题则进一步提供了寻找关系的线索。初步的实验结果表明与没有利用主题信息的方法相比,该方法可以取得更好的召回率和准确率。Acquiring knowledge from Chinese natural language documents is very difficult due to the particular characteristic of Chinese. Although many researchers have made great progress on the Chinese named entity recognition(NER for short), it is hardly possible to extract correctly the binary relationships between a pair of recognized entities without the facilities of synonym tables, or some Chinese linguistic ontology like WordNet. Propose an ontology-theme-based method to extract these relationships from Chinese natural language documents. It is the first time to import the theme idea into domain ontology. Concepts and properties of the original domain ontology are partitioned according to the themes and the mapping relations between concepts and themes, themes and properties are established. For a sentence being processed, some entities, individuals and properties can be extracted firstly by simple NER and direct string-ontology matching. These correctly extracted information can then be used to infer the themes of this sentence. Further, the themes can provide useful clues to find more possible relationships. Results of elementary experiments indicate that this theme-based approach can obtain a higher recall rate and precision rate compared with other methods without the incorporation of theme.

关 键 词:领域本体 主题信息 中文自然语言 知识获取 方法 natural language 识别 属性 概念 语言知识库 主题思想 知识抽取 领域专家 获取知识 关联关系 本体映射 准确率 召回率 线索 文档 

分 类 号:TP[自动化与计算机技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象