基于Wiki链接结构图聚类的领域词典构建方法被引量：7

Domain Thesaurus Construction Based on Wiki Hyperlink Structure Graph Clustering

机构地区：[1]中国科学技术大学电子工程与信息科学系,合肥230027 [2]中国电子科技集团公司第二十八研究所信息系统工程重点实验室,南京210007 [3]中国科学技术大学自动化系,合肥230021

出　　处：《小型微型计算机系统》2014年第6期1286-1292,共7页Journal of Chinese Computer Systems

基　　金：国家科技支撑计划课题项目(2011BAH11B01)资助

摘　　要：领域词典在信息检索、自然语言处理,以及问答系统等方面有着重要的应用.由于自然语言的复杂性,基于NLP的领域词典构建方法难以取得理想的结果.近年来Wiki百科得到了广泛的使用.Wiki不仅包含海量的文章,还拥有丰富的链接结构.基于超链接的锚描述性和主题局部性,提出一种基于有权无向链接结构图聚类的领域词典自动构建方法.该方法首先利用Wiki构建关于某特定领域的无向链接结构图,然后使用LSI算法和余弦相似度计算每条链接的权重,再利用CPMw算法对该有权无向链接结构图进行聚类,从而得到最终的领域词典.实验表明,本文提出的方法可以获得更好的领域词典构建结果.The domain thesaurus plays an important role in information retrieval, natural language processing, question answering system etc. Due to the complexity of the natural language, the NLP based thesaurus constructing methods are difficult to achieve a desired result. In recent years, Wild has been widely used as a knowledge base. Wild contains not only a large hum of articles, but also has a dense link structure. Based on the characteristics anchor description and topic locality of hyperlinks, this paper proposes a weighted undirected hyperlink structure graph clustering based domain thesaurus construction method. The method first constructs a domain-specific hypedink structure graph using Wild, and then uses LSI algorithm to calculate the weight of each hyperlink. Then the method uses CPMw algorithm to cluster the weighted undirected hyperlink structure graph. After this step, the domain thesaurus can be achieved. The experiments show that method proposed in this paper can get better results.

关键词：领域典构建 WIKI CPMw LSI

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Wiki链接结构图聚类的领域词典构建方法被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Wiki链接结构图聚类的领域词典构建方法 被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于Wiki链接结构图聚类的领域词典构建方法被引量：7