检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:朱俊屹 常雷雷 徐晓滨[1,2] 郝智勇 于海跃[4] 姜江 ZHU Junyi;CHANG Leilei;XU Xiaobin;HAO Zhiyong;YU Haiyue;JIANG Jiang(China-Austria Belt and Road Joint Laboratory on Artificial Intelligence and Advanced Manufacturing(Hangzhou Dianzi University),Hangzhou Zhejiang 310018,China;School of Automation,Hangzhou Dianzi University,Hangzhou Zhejiang 310018,China;School of Finance and Economics,Shenzhen Institute of Information Technology,Shenzhen Guangdong 518172,China;College of Systems Engineering,National University of Defense Technology,Changsha Hunan 410073,China)
机构地区:[1]中国-奥地利人工智能与先进制造“一带一路”联合实验室(杭州电子科技大学),杭州310018 [2]杭州电子科技大学自动化学院,杭州310018 [3]深圳信息职业技术学院财经学院,广东深圳518172 [4]国防科技大学系统工程学院,长沙410073
出 处:《计算机应用》2025年第4期1035-1041,共7页journal of Computer Applications
基 金:国家重点研发计划项目(2022YFE0210700);国家自然科学基金资助项目(72471767);浙江省基础公益研究计划项目(LTGG23F030003);浙江省属高校基本科研业务费资助项目(GK239909299001-010)。
摘 要:为了弥补有监督学习对监督信息要求过高的不足,提出一种基于最小先验知识的自监督学习方法。首先,基于数据的先验知识聚类无标签数据,或基于有标签数据的中心距离为无标签数据生成初始标签;其次,随机抽取赋予标签后的数据,并选择机器学习方法建立子模型;再次,计算各个数据抽取的权重和误差,以求得数据平均误差作为各个数据集的数据标签度,并根据初始数据标签度设置迭代阈值;最后,比较迭代过程中数据标签度的大小和阈值决定是否达到终止条件。在10个UCI公开数据集上的实验结果表明,相较于无监督学习K-means等方法、有监督学习支持向量机(SVM)等算法和主流自监督学习TabNet(Tabular Network)等方法,所提方法在不平衡数据集不使用标签,或在平衡数据集上使用有限标签时仍可以取得较高的分类准确度。In order to make up for the high demand of supervised information in supervised learning,a self-supervised learning method based on minimal prior knowledge was proposed.Firstly,the unlabeled data were clustered on the basis of the prior knowledge of data,or the initial labels were generated for unlabeled data based on center distances of labeled data.Secondly,the data were selected randomly after labeling,and the machine learning method was selected to build submodels.Thirdly,the weight and error of each data extraction were calculated to obtain average error of the data as the data label degree for each dataset,and set an iteration threshold based on the initial data label degree.Finally,the termination condition was determined on the basis of comparing the data-label degree and the threshold during the iteration process.Experimental results on 10 UCI public datasets show that compared with unsupervised learning algorithms such as K-means,supervised learning methods such as Support Vector Machine(SVM)and mainstream self-supervised learning methods such as TabNet(Tabular Network),the proposed method achieves high classification accuracy on unbalanced datasets without using labels or on balanced datasets using limited labels.
关 键 词:最小先验知识 自监督学习 机器学习 数据标签度 迭代阈值
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.171