检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]大庆石油学院计算机与信息技术学院,黑龙江大庆163318
出 处:《长江大学学报(自科版)(上旬)》2010年第1期72-75,共4页JOURNAL OF YANGTZE UNIVERSITY (NATURAL SCIENCE EDITION) SCI & ENG
基 金:黑龙江省普通高等学校骨干教师创新能力资助计划项目(1055G002);黑龙江省自然科学基金项目(ZA2006-11);黑龙江省科技攻关项目(GZ07A103)
摘 要:针对信息挖掘中的文本自动聚类问题,提出了一种基于模糊向量空间模型的核聚类算法。首先对聚类文本进行模糊特征提取得到模糊特征项集,然后依据模糊特征项集对每篇文本计算特征项的文档频数,进而得出每篇文本的模糊特征向量。最后利用高斯核函数将每篇文本的特征向量映射到高维特征空间,在高维特征空间中利用核聚类算法实施文本聚类。该方法在特征提取时充分考虑了特征项在文档中的位置信息,使自动聚类原则更接近手工聚类方法。以中国期刊网全文数据库部分文档数据为例验证了该方法的有效性。To address document classification in data mining,a FVSM-based kernel clustering algorithm was presented in this paper.Firstly,the fuzzy features of the document were extracted and added to the fuzzy feature set.Secondly,the document frequencies of feature item were computed and the fuzzy feature vectors of documents were acquired.Finally,the fuzzy feature vectors were mapped in high-dimensional feature space by Gauss kernel function,and the documents were clustered by the kernel clustering algorithm in high-dimensional feature space.As the locality information of feature item is considered while the features are extracted,this method is closer to manual classification.The availability of this algorithm is proved by clustering results of some documents in Chinese periodical document database.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.138.109.3