基于最小先验知识的自监督学习方法

Self-supervised learning method using minimal prior knowledge

作　　者：朱俊屹常雷雷徐晓滨[1,2] 郝智勇于海跃[4] 姜江 ZHU Junyi;CHANG Leilei;XU Xiaobin;HAO Zhiyong;YU Haiyue;JIANG Jiang(China-Austria Belt and Road Joint Laboratory on Artificial Intelligence and Advanced Manufacturing(Hangzhou Dianzi University),Hangzhou Zhejiang 310018,China;School of Automation,Hangzhou Dianzi University,Hangzhou Zhejiang 310018,China;School of Finance and Economics,Shenzhen Institute of Information Technology,Shenzhen Guangdong 518172,China;College of Systems Engineering,National University of Defense Technology,Changsha Hunan 410073,China)

机构地区：[1]中国-奥地利人工智能与先进制造“一带一路”联合实验室(杭州电子科技大学),杭州310018 [2]杭州电子科技大学自动化学院,杭州310018 [3]深圳信息职业技术学院财经学院,广东深圳518172 [4]国防科技大学系统工程学院,长沙410073

出　　处：《计算机应用》2025年第4期1035-1041,共7页journal of Computer Applications

基　　金：国家重点研发计划项目(2022YFE0210700);国家自然科学基金资助项目(72471767);浙江省基础公益研究计划项目(LTGG23F030003);浙江省属高校基本科研业务费资助项目(GK239909299001-010)。

摘　　要：为了弥补有监督学习对监督信息要求过高的不足,提出一种基于最小先验知识的自监督学习方法。首先,基于数据的先验知识聚类无标签数据,或基于有标签数据的中心距离为无标签数据生成初始标签;其次,随机抽取赋予标签后的数据,并选择机器学习方法建立子模型;再次,计算各个数据抽取的权重和误差,以求得数据平均误差作为各个数据集的数据标签度,并根据初始数据标签度设置迭代阈值;最后,比较迭代过程中数据标签度的大小和阈值决定是否达到终止条件。在10个UCI公开数据集上的实验结果表明,相较于无监督学习K-means等方法、有监督学习支持向量机(SVM)等算法和主流自监督学习TabNet(Tabular Network)等方法,所提方法在不平衡数据集不使用标签,或在平衡数据集上使用有限标签时仍可以取得较高的分类准确度。In order to make up for the high demand of supervised information in supervised learning,a self-supervised learning method based on minimal prior knowledge was proposed.Firstly,the unlabeled data were clustered on the basis of the prior knowledge of data,or the initial labels were generated for unlabeled data based on center distances of labeled data.Secondly,the data were selected randomly after labeling,and the machine learning method was selected to build submodels.Thirdly,the weight and error of each data extraction were calculated to obtain average error of the data as the data label degree for each dataset,and set an iteration threshold based on the initial data label degree.Finally,the termination condition was determined on the basis of comparing the data-label degree and the threshold during the iteration process.Experimental results on 10 UCI public datasets show that compared with unsupervised learning algorithms such as K-means,supervised learning methods such as Support Vector Machine(SVM)and mainstream self-supervised learning methods such as TabNet(Tabular Network),the proposed method achieves high classification accuracy on unbalanced datasets without using labels or on balanced datasets using limited labels.

关键词：最小先验知识自监督学习机器学习数据标签度迭代阈值

分类号：TP391.4[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于最小先验知识的自监督学习方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于最小先验知识的自监督学习方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索