大数据技术架构下的高维数据挖掘算法分析  被引量:7

Analysis of high dimensional data mining algorithm based on big data technology architecture

在线阅读下载全文

作  者:李晓辉[1] LI Xiao-hui(Baoji Education Institute of Shaanxi Province,Baoji 721004,Shaanxi Province,China)

机构地区:[1]陕西省宝鸡教育学院,陕西宝鸡721004

出  处:《信息技术》2021年第10期122-126,共5页Information Technology

摘  要:随着互联网的高速发展,海量的数据处理技术受到越来越多的关注。互联网时代的数据大部分为非结构化数据,这类数据的特征向量维数很高,其庞大的维数将会引起维度灾难,对数据的处理和存储造成很大的障碍。通过分析该类问题,引入特征选择方法检测冗余的特征,得到数据的特征子集,从而达到降低维数的目的。然后,依靠高维数据的特征选择和集成聚类方法,从聚类成员质量和数据个体差异着手,归纳出适用于高维数据分层抽样的集成聚类算法。实验结果显示,与传统的随机特征抽样算法相比,该算法具有更好的聚类优势。该项研究对于后续数据挖掘工作具有积极的研究意义和实用价值。With the rapid development of the Internet,massive data processing technology has attracted more and more attention.In the Internet age,most of the data are unstructured data,the dimension of feature vector of which is very high,and its huge dimension will cause dimension disaster and great obstacles to data’s processing and storage.By analyzing these problems,feature selection method is introduced to detect redundant features and get feature subset of data,so as to reduce the dimension.Then,based on the feature selection and ensemble clustering method of high-dimensional data,and from the quality of clustering members and individual differences of data,the ensemble clustering algorithm that is suitable for hierarchical sampling of high-dimensional data is summarized.Experiment results show that,compared with the traditional random feature sampling algorithm,this algorithm has better clustering advantages.This research has positive research significance and practical value for the follow-up work of data mining.

关 键 词:特征选择 分层抽样 数据挖掘 高维数据 聚类分析 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象