基于哈希学习的投票样例选择算法被引量：1

Voting instance selection algorithm based on learning to hash

作　　者：黄雅婕翟俊海[1,2] 周翔李艳[1,2,3] HUANG Yajie;ZHAI Junhai;ZHOU Xiang;LI Yan(College of Mathematics and Information Science,Baoding Hebei 071002,China;Key Laboratory of Machine Learning and Computational Intelligence(Hebei University),Baoding Hebei 071002,China;Research Center for Applied Mathematics and Interdisciplinary Sciences,Beijing Normal University at Zhuhai,Zhuhai Guangzhou 519087,China)

机构地区：[1]河北大学数学与信息科学学院,河北保定071002 [2]河北省机器学习与计算智能重点实验室(河北大学),河北保定071002 [3]北京师范大学珠海校区应用数学与交叉科学研究中心,广东珠海519087

出　　处：《计算机应用》2022年第2期389-394,共6页journal of Computer Applications

基　　金：河北省科技计划项目重点研发专项(19210310D);河北省自然科学基金资助项目(F2018201096);河北大学研究生创新资助项目(hbu2019ss077)。

摘　　要：随着数据的海量型增长,如何存储并利用数据成为目前学术研究和工业应用等方面的热门问题。样例选择是解决此类问题的方法之一,它在原始数据中依据既定规则选出代表性的样例,从而有效地降低后续工作的难度。基于此,提出一种基于哈希学习的投票样例选择算法。首先通过主成分分析(PCA)方法将高维数据映射到低维空间;然后利用k-means算法结合矢量量化方法进行迭代运算,并将数据用聚类中心的哈希码表示;接着将分类后的数据按比例进行随机选择,在多次独立运行算法后投票选择出最终的样例。与压缩近邻(CNN)算法和大数据线性复杂度样例选择算法LSH-IS-F相比,所提算法在压缩比方面平均提升了19%。所提算法思想简单容易实现,能够通过调节参数自主控制压缩比。在7个数据集上的实验结果显示所提算法在测试精度相似的情况下在压缩比和运行时间方面较随机哈希有较大优势。With the massive growth of data,how to store and use data has become a hot issue in academic research and industrial applications.As one of the methods to solve these problems,instance selection effectively reduces the difficulty of follow-up work by selecting representative instances from original data according to the established rules.Therefore,a voting instance selection algorithm based on learning to hash was proposed.Firstly,the Principal Component Analysis(PCA)method was used to map high-dimensional data to low-dimensional space.Secondly,the k-means algorithm was used to perform iterative operations by combining with the vector quantization method,and the hash codes of the cluster center were used to represent the data.After that,the classified data were randomly selected according to the proportion,and the final instances were selected by voting after several times independent running of the algorithm.Compared with the Compressed Nearest Neighbor(CNN)algorithm and the instance selection algorithm of linear complexity for big data named LSH-IS-F(Instance Selection algorithm by Hashing with two passes),the proposed algorithm has the compression ratio improved by an average of 19%.The idea of the proposed algorithm is simple and easy to implement,and the algorithm can control the compression ratio automatically by adjusting the parameters.Experimental results on 7 datasets show that the proposed algorithm has a great advantage compared to random hashing in terms of compression ratio and running time with similar test accuracy.

关键词：样例选择哈希学习海明距离矢量量化投票方法

分类号：TP181[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于哈希学习的投票样例选择算法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于哈希学习的投票样例选择算法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于哈希学习的投票样例选择算法被引量：1