共现聚类分析的新方法:最大频繁项集挖掘  被引量:22

A Novel Approach for Co-occurrence Clustering Analysis: Maximal Frequent Itemset Mining

在线阅读下载全文

作  者:徐硕[1] 乔晓东[1] 朱礼军[1] 张运良[1] 薛春香[2] 

机构地区:[1]中国科学技术信息研究所,北京100038 [2]南京理工大学经济管理学院,南京210094

出  处:《情报学报》2012年第2期143-150,共8页Journal of the China Society for Scientific and Technical Information

基  金:)本研究受“十二五”国家科技支撑计划项目“面向外文科技知识组织体系的大规模语义计算关键技术研究”(2011BAH10804);中国科学技术信息研究所预研项目“科技文献深层领域主题监测及主题演化规律揭示”(YY-201129);江苏省社会科学基金项目“数字报纸的自动标引研究”(09TQC011)和教育部人文社会科学研究项目“电子报纸内容深加工研究”(09YJC870014)资助.

摘  要:针对某一领域的文献,如果两个研究对象同现的频率越高,则通常假设二者存在联系的可能性越大。从而促使共词分析、文献共引分析以及文献作者共著分析等共现分析方法的流行。然而,传统共现分析三个阶段中的前两个阶段存在一定的缺陷,从而导致最后得到的共现聚类分析的结果可能存在一定的误导性。为克服该缺陷,本文从关联规则挖掘领域引入了一种新的共现聚类分析方法——最大频繁项集挖掘,它将传统共现分析法的三个阶段压缩为一个阶段,充分利用了可以利用的各种信息,克服了传统方法的缺陷。通过实验分析发现,设置合适的最小支持度阈值,基本上可以得到比较满意的结果。In documents for some area, if two research objects have higher co-occurrence frequency, then one usually assumes that there is higher probability an underlying link exists between the two objects. It is this reason that prompts the popularity of many co-occurrence analysis methods, such as co-word analysis, co-citation analysis, co-authorship analysis, etc. The process of traditional co-occurrence analysis often consists of three steps. However, there are problematic for the previous two steps, which may lead to some misleading co-occurrence clustering results. Therefore, this paper introduces a new method for co-occurrence clustering analysis--maximal frequent itemset mining--from association rule mining domain. This approach compresses three steps in the traditional co-occurrence clustering into one step, which simplifies greatly the resulting process. One of the most appealing characteristic of this approach is that it can make the best use of all available information, which overcomes the problem in the traditional co-occurrence analysis. Experimental results show that one can basically obtain satisfactory clustering results by setting a proper minimal support threshold.

关 键 词:共现分析 共词分析 聚类分析 最大频繁项集 层次聚类 

分 类 号:G254[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象