检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王留洋 俞扬信 陈伯伦 章慧 WANG Liuyang;YU Yangxin;CHEN Bolun;ZHANG Hui(Faculty of Computer&Software Engineering,Huaiyin Institute of Technology,Huai’an Jiangsu 223003,China)
机构地区:[1]淮阴工学院计算机与软件工程学院,江苏淮安223003
出 处:《计算机应用》2020年第4期1069-1073,共5页journal of Computer Applications
基 金:国家自然科学基金资助项目(61602202)。
摘 要:不同的聚类算法用于设计各自的策略,然而,每种技术在执行特定数据集时都有一定的局限性。选择恰当的识别信息方法(DIM)可确保文档聚类的进行。针对这些问题提出一种基于共识和分类的文档聚类(DCCC)的DIM。首先,选择识别信息最大化聚类(CDIM)作为数据集生成初始聚类的解决方法,并使用两种不同的CDIM方法生成两个初始聚集;其次,使用不同的参数方法对两初始聚集再进行初始化,通过簇标签信息间的关系建立共识,最大限度地提高文档的识别数总和;最后,选择识别文本权重分类(DTWC)作为文本分类器给共识分配新的簇标签,通过训练文本分类器更改基础分区,并根据预报标签信息生成最后的分区。采用8个网络数据集进行实验,选择BCubed的精度和召回率指标进行聚类验证。实验结果表明,所提出的共识分类方法的聚类结果优于对比方法的聚类结果。Different clustering algorithms are used to design their own strategies.However,each technology has certain limitations when it executes a particular dataset.An adequate choice of Discrimination Information Method(DIM)can ensure the document clustering.To solve these problems,a DIM of Document Clustering based on Consensus and Classification(DCCC)was proposed.Firstly,Clustering by DIM(CDIM)was used to solve the generation of initial clustering for dataset,and two initial cluster sets were generated by two different CDIMs.Then,two initial cluster sets were initialized again by different parameter methods,and a consensus was established by using the relationship between the cluster label information,so as to maximize the sum of documents’discrimination number.Finally,Discrimination Text Weight Classification(DTWC)was chosen as text classifier to assign new cluster label to the consensus,the base partitions were altered by training the text classifier,and the final partition was obtained based on the predicted label information.Experiments on 8 network datasets for clustering verification by BCubed’s precision and recall index were carried out.Experimental results show that the clustering results of the proposed consensus and classification method are superior to those of comparison methods.
关 键 词:共识聚类 文档聚类 识别信息 簇标签 文本分类器
分 类 号:TP391.3[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.188.152.124