维基百科中查询分类知识挖掘方法研究  被引量:1

Query Classification Knowledge Extraction from Wikipedia

在线阅读下载全文

作  者:段建勇[1] 窦光辉 张梅[1] 谢宇超[1] 

机构地区:[1]北方工业大学信息工程学院,北京100144

出  处:《小型微型计算机系统》2014年第7期1591-1595,共5页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(61103112)资助;国家社会科学基金项目(11CTQ036)资助;北京市哲学社会科学规划基金项目(13SHC031)资助;国家语委十二五规划基金项目(YB125-10)资助

摘  要:查询分类需要建立查询意图的分类知识体系,每个查询类别中的分类知识规模相对比较大,因而不能保证每一个查询类别都能被覆盖.提出基于随机游走方式的查询分类知识挖掘方法,首先抽取维基百科中的全部词条与分类知识形成集合,并采用随机游走方式遍历图中所有概念结点,得到每个结点的概率分布,并将其转化成分类权重,最终构建查询知识链接图.该方法借助维基百科能够解决数据稀疏问题.通过随机游走方式对未直接关联的查询进行相似度计算,提高查询分类的覆盖率.实验证实,该方法能够有效定位用户的查询领域.The traditional approaches to identify user's query intent need large classifiers in early classification to understand the intent behind user's query. There always are some samples not being covered. This paper proposes to mine query classification knowledge by random walk method from Wikipedia. The Wikipedia concepts are used as the intent representation space,each intent domain is represented as a set of Wikipedia articles and categories,the random walk graph system will be built through the architecture of Wikipedia's knowledge,on which the random walk processing is carried out. And a probability that belongs to the intent will is obtained for each concept. Then the finial prediction on query intent is presented. It solve the data sparseness problem by introducing Wikipedia as the external knowledge and build the indirect connections among concepts and classifications by the random walk. Finally results show the method is provides an effective solution to query intent classification.

关 键 词:随机游走 查询分类 维基百科 信息抽取 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象