基于核极限学习机的多标签数据流半监督在线分类方法  被引量:1

Semi-supervised Online Classification Method for Multi-label Data Stream Based on Kernel Extreme Learning Machine

在线阅读下载全文

作  者:王雨晨 邱士远 李培培[1,2,3] 胡学钢 WANG Yuchen;QIU Shiyuan;LI Peipei;HU Xuegang(School of Computer Science and Information Engineering,He-fei University of Technology,Hefei 230601;Key Laboratory of Knowledge Engineering with Big Data of Ministry of Education of China,Hefei University of Technology,Hefei 230009;Institute of Health Big Data and Population Medicine,Institute of Health and Medicine,Hefei Comprehensive National Science Center,Hefei 230032;Anhui Province Key Laboratory of Industry Safety and Emergency Technology,Hefei University of Technology,Hefei 230009)

机构地区:[1]合肥工业大学计算机与信息学院,合肥230601 [2]合肥工业大学大数据知识工程教育部重点实验室,合肥230009 [3]合肥综合性国家科学中心大健康研究院健康大数据与群体医学研究所,合肥230032 [4]合肥工业大学安徽省工业安全与应急技术重点实验室,合肥230009

出  处:《模式识别与人工智能》2024年第8期741-754,共14页Pattern Recognition and Artificial Intelligence

基  金:国家自然科学基金项目(No.62376085,62076085,62120106008);合肥综合性国家科学中心大健康研究院健康大数据与群体医学研究所专项资金项目(No.JKS20230030)资助。

摘  要:实际应用中涌现的大量流数据具有高速到达、海量、动态变化等特点,同时,这些数据流常含有多个标签且只有少量数据被标记,从而带来多标签数据环境下的概念漂移与标签缺失问题.为此,文中提出基于核极限学习机的多标签数据流半监督在线分类方法.首先,针对多标签数据流的标签缺失问题,根据滑动窗口将数据流划分为k块,对每块数据构造特征相似性矩阵和标签相似性矩阵,并加入核极限学习机的训练中.同时为了适应流数据的特点,设计增量式更新机制,构建半监督在线核极限学习机.然后,为了适应数据流中的概念漂移问题,采用基于时间戳丢弃更新的机制,预先设定数据规模,当数据到达指定规模后,丢弃最旧的无标签数据,将新的数据加入更新.最后,在10个多标签数据集上的实验表明,文中方法对标签缺失和概念漂移问题具有较强的适应能力,并能保持较优的分类效果.In practical applications,a large amount of streaming data emerges,and it is characterized of high arrival speed,massive volume and dynamic variation.Moreover,the data streams often contain multiple labels but only a small amount of data in the streams is labeled,causing the problems of concept drift and label missing in the multi-label data.To solve these problems,a semi supervised online classification method for multi-label data stream based on kernel extreme learning machine is proposed in this paper.Firstly,the data stream is divided into k blocks according to the sliding window to tackle the label missing problem in multi-label data stream.A feature similarity matrix and a label similarity matrix are constructed for each piece of data and they are added to the training of kernel extreme learning machine model.An incremental update mechanism is designed to construct a semi-supervised online kernel extreme learning machine to adapt to the characteristics of streaming data.Secondly,to address the issue of the concept drift problem in data stream,the timestamp mechanism is adopted for discarding update.The data size is preset in advance.When the data reaches the specified size,the oldest unlabeled data is discarded and new data is added for updating.Finally,experiments on 10 multi-label datasets demonstrate that the proposed method possesses strong adaptability to the problems of label missing and concept drift,while maintaining good classification performance.

关 键 词:数据流分类 半监督分类 多标签分类 核极限学习机 概念漂移 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象