检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:解海燕[1] 李杰[1] 赵国栋 XIE Hai-yan;LI Jie;ZHAO Guo-dong(Yinchuan University of Energy,Yinchuan Ningxia 750100,China;Network Information Management Center,Ningxia University,Yinchuan Ningxia 750021,China)
机构地区:[1]银川能源学院,宁夏银川750100 [2]宁夏大学网络与信息管理中心,宁夏银川750021
出 处:《计算机仿真》2024年第7期474-478,共5页Computer Simulation
基 金:银川能源学院科研项目(2023-KY-Z-3);宁夏自然科学基金(2021aac03118)。
摘 要:非结构化数据的维度较高,每个样本数据包含的特征非常多,导致了维度灾难问题,使得降低维度并保持有效特征提取难度较大,影响大数据流量异常时间点挖掘的精度。为此,提出新的基于空间映射的非结构化高维大数据流量异常时间点挖掘方法。通过近似解集的几何特征建立稀疏回归模型,求解高维目标空间映射到低维目标子空间的稀疏投影矩阵。根据密度分布选择出一个高密度集合作为聚类中心的候选集,确定聚类的初始聚类中心。同时对聚类形成的各个簇采用剪枝算法,选择时间点候选集,对候选集展开二次判断,挖掘高维大数据流量异常时间点。实验结果表明,数据的降维能有效提高流量异常挖掘精度。相比之下,所提方法的高维大数据流量异常时间点挖掘更加精准,耗时更短。Generally,unstructured data has a high dimension.Each sample contains a large number of features,leading to dimensionality reduction,so it is difficult to maintain effective feature extr action.Therefore,a new method for mining abnormal time points in high-dime nsional unstructured big data traffic based on spatial mapping was put forward.First of all,a sparse regression mo del was built by using the geometric characte ristics of the approximate solution set.And then,the sparse projection matrix mapping from high-dimensional space to low-dimensional subspace was solved.Moreover,based on the density distribution,a high-density set was selected as the candidate set of the clustering center,thus determining the initial clustering center for clustering.Meanwhile,a pruning algorithm was applied to all the clusters.Furthermore,a can didate set of time points was selected.After that,a secondary judgment was performed on the candidate set.Finally,the abnormal time points in high-dimensional big data traffic were mined successfully.Experimental results prove that dimensionality reduction of data can effectively improve the mining accuracy of abnormal traffic.In comparison,the propo sed method is more accurate and time-efficient in mining abnormal time points of high-dimensional big data traffic.
关 键 词:非结构化数据 高维大数据 流量 异常时间点 挖掘方法
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145