检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:朱旭东 熊贇 ZHU Xu-dong;XIONG Yun(School of Computer Science and Technology,Fudan University,Shanghai 200433,China;Research Center of Dataology and Data Science,Fudan University,Shanghai 200433,China)
机构地区:[1]复旦大学计算机科学与技术学院,上海200433 [2]上海市数据科学重点实验室(复旦大学),上海200433
出 处:《计算机科学》2022年第6期210-216,共7页Computer Science
基 金:国家自然科学基金(U1636207)。
摘 要:与一般图像分类场景下的数据分布情况不同,在图像多标签分类问题的场景下,不同标签类别之间存在样本数量分布不均衡,少量头部类别通常占据大多数样本数量的情况。而由于多个标签间同时标记的相关性,再加上多标签下困难样本的分布还与数据分布和类别分布相关,使得单标签问题中解决数据不平衡的重采样等方法在多标签场景下无法有效适用。文中提出了一种基于图像多标签场景下样本分布损失和深度学习的分类方法。首先对多标签数据不均衡分布设置类别相关重采用损失,并通过动态学习方式防止分布过度异化,然后设计非对称样本学习损失,设置对正负样本和困难样本的不同学习能力,同时通过软化样本学习权重减少信息丢失。相关数据集的实验显示,所提算法在解决多标签数据分布不均衡场景下的样本学习问题时取得了很好的效果。Different from the data distribution in general image classification scenarios,in the scenario of multi label image classification,the sample number distribution among different label categories is unbalanced,and a small number of head categories often account for the majority of sample size.However,due to the correlation between multiple labels,and the distribution of diffi-cult samples under multiple labels is also related to the data distribution and category distribution,the re-sampling and other methods for solving the data imbalance in the single label problem cannot be effectively applied in the multi label scenario.This paper proposes a classification method based on the loss of sample distribution in multi label image scene and deep learning.Firs-tly,the unbalanced distribution of multi label data is set with category correlation,and the loss is re-used,and the dynamic lear-ning method is used to prevent the excessive alienation of distribution.Then,the asymmetric sample learning loss is designed,and different learning abilities for positive and negative samples and difficult samples are set.At the same time,the information loss is reduced by softening the sample learning weight.Experiments on related data sets show that the algorithm has achieved good results in solving the sample learning problem in the scene of uneven distribution of multi-label data.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.36