面向非均匀分布数据的代价敏感标记分布学习  

Cost-sensitive Label Distribution Learning for Non-Uniform Distributed Data

在线阅读下载全文

作  者:樊俊 张恒汝[1,2] 余一帆 闵帆[1,2] FAN Jun;ZHANG Hengru;YU Yifan;MIN Fan(College of Computer Science,Southwest Petroleum University,Chengdu 610500,China;Lab of Machine Learning,Southwest Petroleum University,Chengdu 610500,China)

机构地区:[1]西南石油大学计算机科学学院,成都610500 [2]西南石油大学机器学习研究中心,成都610500

出  处:《西南大学学报(自然科学版)》2024年第5期40-50,共11页Journal of Southwest University(Natural Science Edition)

基  金:国家自然科学基金资助项目(61902328);南充市科技局应用基础研究项目(SXHZ040).

摘  要:标记歧义近年来在机器学习和数据挖掘领域备受关注.标记分布学习(LDL)通过为样本分配概率标记来解决标记歧义问题.现有的LDL方法主要是为处理训练数据均匀分布的情况而设计的.然而,在实际应用中,训练数据往往呈现非均匀分布.因此,提出了一种代价敏感的标记分布学习方法(CSLDL),用以处理这种非均匀分布的数据.通过充分利用样本的密度信息,设计了一种新的损失函数.首先,将描述度集平均划分为多个区间,并统计这些区间中的样本个数,从而推导出每个类别标记的经验密度向量.其次,为了确保不同区间之间的连续性,利用邻居来对目标区间的经验密度进行修正.将经验密度向量与对称核进行卷积,以使每个区间不仅考虑当前区间,还考虑附近区间.最后,利用修正后的密度向量构建代价矩阵,并结合Kullback-Leibler(K-L)散度来处理非均匀分布的训练数据.CSLDL在10个真实世界的数据集上与6种最先进的算法进行了对比实验.实验结果充分验证了提出的方法的有效性和优越性.Learning with label ambiguity has recently been a popular topic in machine learning and data mining research.Label distribution learning(LDL)deals with label ambiguity by assigning probabilistic labels to each instance.Existing LDL methods are designed for training data that is uniformly distributed.However,in real-world applications,the training data is typically not uniformly distributed.In this paper,we propose a cost-sensitive method for label distribution learning(CSLDL)to deal with the non-uniformly distributed training data.We designed a novel loss function by applying the density information of instances.The descriptive set was firstly averaged over multiple bins.The empirical density vector for each class label was then derived by counting the number of instances in these bins.Secondly,in order to construct the continuity between different bins,we employed neighbor samples to modify the empirical density of the target bins.Specifically,we convolved the empirical density vector with a symmetric kernel so that each bin took into account not just the current bin but also nearby bins.Finally,a cost matrix was constructed using the modified density vectors,combined with Kullback-Leibler(K-L)divergence to deal with non-uniformly distributed training data.Experiments were undertaken on ten real-world datasets compared with six state-of-the-art algorithms.Results demonstrate the effectiveness and superiority of our proposed algorithm.

关 键 词:标记分布学习 标记歧义 非均匀分布数据 代价敏感 样本密度 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象