泛化类型的机读词典属性信息抽取  

ACQUISITION OF ATTRIBUTE INFORMATION OF MACHINE-READABLE DICTIONARY IN GENERIC TYPE

在线阅读下载全文

作  者:王随涛[1] 陆汝占[1] 

机构地区:[1]上海交通大学计算机科学与工程系,上海200240

出  处:《计算机应用与软件》2011年第4期1-3,16,共4页Computer Applications and Software

基  金:国家自然科学基金项目(60873135)

摘  要:为了构建实体关系网络、改进和完善基于概念的信息检索,提出一种不针对特定属性类型的从机读词典中抽取概念实例的属性值信息的方法。首先,通过手工标注和遴选等方式生成初始实体—属性值对集并抽取出粗糙模式实例集;其次,经过对模式实例集的聚类合并和扩充处理得到若干组的模式实例,每一组代表一个属性类型;最后,从词典中抽取出新实体词汇的属性值信息。在模式实例集的处理中引入了同义词扩展和词汇语义相似度计算以提高模式实例的覆盖率。实验中针对《现代汉语规范词典》中的电子领域词汇进行抽取,取得了较好的效果。This paper presents a method to acquire the attribute value information of conceptual instances from machine-readable dictionary in light to generic attribute types in order to build the network of entity-relationships and to improve and perfect the conceptual-based information retrieval.First,the method generates preliminary entity-attribute value pair sets by means of manual marking and selecting and acquires rough pattern instances set.Secondly,the method obtains several groups of pattern instances by clustering,merging and expanding the pattern instances set,each group represents a type of attribute.Finally,the method acquires the attribute value information of new entity vocabulary from dictionary.When processing pattern instances set the semantic similarity of the vocabulary and synonym extension are introduced to improve the coverage of pattern instances.In experiment the extraction aiming at the vocabulary in electronic field is conducted from the Standard Dictionary of Modern Chinese and the result is good.

关 键 词:信息抽取 模式实例 相似度 泛化类型 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象