多层级联式少数类聚类高精度数据挖掘算法  被引量:12

High Precision Data Excavating Algorithm Based on Multi-layer Cascade Clustering

在线阅读下载全文

作  者:许统德[1] 赵志俊[2] 高俊文[1] XU Tong-de;ZHAO Zhi-jun;GAO Jun-wen(Teaching Affairs Office, Guangdong Agriculture Industry Business Polytechnic, Guangzhou 510507, China;Sontan College, Guangzhou University, Guangzhou 511370, China)

机构地区:[1]广东农工商职业技术学院教务处,广州510507 [2]广州大学松田学院,广州511370

出  处:《控制工程》2018年第5期829-834,共6页Control Engineering of China

基  金:广东省高等教育研究立项课题(201401154)

摘  要:数据挖掘领域中类别不平衡数据分类属于热门研究课题。在传统分类算法中,由于存在一定程度的偏向性,使得少数类的分类效果欠佳。基于此,提出一种多层级联式少数类聚类高精度数据挖掘算法。该算法基于聚类进行欠采样,在多数类样本上进行聚类并提取聚类质心,得到数目等同少数类样本的聚类质心,之后和所有少数类样例一起构建新平衡训练集。为杜绝少数类样本数量过少导致训练集过小而影响分类精度,利用SMOTE过采样结合聚类欠采样,在平衡训练集上通过K均值聚类和C4.5决策树算法相级联的分类方式来优化分类决策的边界。实验表明,该算法在处理类别不平衡数据分类问题方面具备一定的优势。In the field of machine learning and data excavating, the classification of imbalanced data is a hot research topic. In the traditional classification algorithm, the existence of a certain degree of bias makes the classification of a small number poor. To solve this problem, a new algorithm for clustering high precision data excavating with multi-class cascade is proposed. Based on clustering, the algorithm constructs a new balanced training set, which is based on SMOTE. And K means clustering is used to cluster the training samples into K clusters, and the C4.5 algorithm mean clustering algorithm is used to optimize the classification decision by means of K clustering. The algorithm is based on C4.5 algorithm. Experiments show that this algorithm has certain advantages in dealing with the problem of data classification.

关 键 词:数据挖掘 少数类分类 多层级 K均值聚类 C4.5决策树 

分 类 号:TP312[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象