基于增量切空间校准的自适应流式大数据学习算法  被引量:1

Self-Adaptive Streaming Big Data Learning Algorithm Based on Incremental Tangent Space Alignment

在线阅读下载全文

作  者:谈超[1] 吉根林 赵斌[1] 

机构地区:[1]南京师范大学计算机科学与技术学院,南京210023

出  处:《计算机研究与发展》2017年第11期2547-2557,共11页Journal of Computer Research and Development

基  金:国家自然科学基金项目(41471371;61702270);江苏省高校自然科学基金项目(15KJB520022)~~

摘  要:流形学习是为了寻找高维空间中观测数据的低维嵌入.作为一种有效的非线性维数约减方法,流形学习被广泛应用于数据挖掘、模式识别等机器学习领域.然而,对于样本外点学习、增量学习和在线学习等流形学习方法,面对流式大数据的学习算法时间效率较低.为此提出了一种新的基于增量切空间的自适应流式大数据学习算法(self-adaptive streaming big data learning algorithm based on incremental tangent space alignment,SLITSA),该算法采用增量PCA的思想,增量地构造子空间,能在线或增量地检测数据流中的内在低维流形结构,在迭代过程中构建新的切空间进行调准,保证了算法的收敛性并降低了重构误差.通过人工数据集以及真实数据集上的实验表明:该算法分类精度和时间效率优于其他学习算法,可推广到在线或流式大数据的应用当中.Manifold learning is developed to find the observed data’s low-dimension embeddings inhigh dimensional data space. As a type of effective nonlinear dimension reduction method, it has beenwidely applied to the machine learning field, such as data mining and pattern recognition, etc.However, when processing a large scale data stream, the complexity of time is too high for manytraditional manifold learning algorithms, including out of sample learning algorithm, incremental learning algorithm, online learning algorithm, and so on. This paper presents a novel sel--adaptive learning algorithm based on incremental tangent space alignment (named S L I T S A) for big data stream processing. SLITSA adopts the incremental PCA to construct the subspace incrementally, and can detect the intrinsic low dimensional manifold structure of data streams online or incrementally. Inorder to ensure the convergence of SLITSA and reduce the reconstruction error, it can also construct a new tangent space for adjustment during the iterative process. Experiments on artificial data sets and real data sets show that the classification accuracy and time efficiency of the proposed algorithm arebetter than other manifold learning algorithms, which can be extended to the application of streamingdata and real-time big data analytics.

关 键 词:流形学习 非线性维数约减 流式大数据 增量切空间 自适应 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象