多示例学习的簇频繁性分析及双角度融合嵌入  

Cluster frequency analysis and dual⁃perspective fusion embedding for multi⁃instance learning

在线阅读下载全文

作  者:杨梅[1,2,3] 张靖宇 闵帆 方宇[1,2,3] Yang Mei;Zhang Jingyu;Min Fan;Fang Yu(School of Computer Science and Software Engineering,Southwest Petroleum University,Chengdu,610500,China;Institute for Artificial Intelligence,Southwest Petroleum University,Chengdu,610500,China;Lab of Machine Learning,Southwest Petroleum University,Chengdu,610500,China)

机构地区:[1]西南石油大学计算机与软件学院,成都610500 [2]西南石油大学人工智能研究院,成都610500 [3]西南石油大学机器学习研究中心,成都610500

出  处:《南京大学学报(自然科学版)》2024年第4期531-541,共11页Journal of Nanjing University(Natural Science)

基  金:南充市-西南石油大学市校科技战略合作专项资金(23XNSYSX0084,23XNSYSX0062);浙江省海洋大数据挖掘与应用重点实验室开放课题(OBDMA202102);国家自然科学基金(61976194)。

摘  要:多示例学习(Multi-Instance Learning,MIL)的训练数据是由若干个未带标记的示例组成的带标记的包,基于嵌入的方法,通过将包嵌入成单向量来解决包表示问题,然而大部分现有方法忽略了示例与包的联系,难以保证所选示例的代表性.同时,单角度的嵌入方法无法有效地提取正、负包的差异信息,使嵌入向量的质量较差.提出一种多示例学习的簇频繁性分析及双角度融合嵌入(FADE).簇频繁性分析技术从正、负子空间中分别筛选部分示例作为子空间的簇心,依据簇心将子空间聚类成簇,再计算簇频繁性指标,选择频繁性较高的簇的簇心组成子空间代表示例集.双角度融合嵌入技术基于正、负子空间代表示例集和差值嵌入函数,分别从正、负角度挖掘信息,融合两个角度信息获得最终的嵌入向量.在29个数据集上与七个MIL算法进行了对比实验,结果表明,FADE的分类准确率总体上优于七个对比算法,在图像数据集上有显著优势,在文本和网页数据集上也表现良好.Multi⁃Instance Learning(MIL)uses labeled bags composed of multiple unlabeled instances as training data.Embedding⁃based methods address bag representation issues by embedding bags into single vectors.However,existing methods often focus on individual instances and overlook the relationship between instances and bags,which compromises the representativeness of the prototypes.Additionally,the differences between positive and negative bags are not considered by single⁃angle embedding methods,resulting in weak embedding vector quality.This paper proposes the Cluster Frequency Analysis and Dual⁃Perspective Fusion Embedding for MIL(FADE).The cluster center selection technique utilizes density peak of instances to choose a certain proportion of instances from positive and negative subspaces as cluster centers.The cluster frequency analysis technique clusters instances within subspaces based on the cluster centers,calculates cluster frequency indicators,and selects high⁃frequency cluster centers to form the prototype instance set of subspaces.The dual⁃perspective fusion embedding technique utilizes the prototype instance sets from positive and negative subspaces,along with a difference embedding function,to extract information from both perspectives and fuse the two sets of information to obtain the final embedding vector.The algorithm is tested on 29 datasets and compared with seven MIL algorithms.Experimental results demonstrate that FADE achieves higher overall classification accuracy compared to the seven benchmark algorithms,particularly excelling on image datasets while performing well on text and web datasets.

关 键 词:多示例学习 嵌入方法 簇频繁性 示例来源 双角度融合 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象