基于相对熵和余弦相似度的并行SVM算法  

Parallel support vector machine algorithm based on relative entropy and cosine similarity

在线阅读下载全文

作  者:毛伊敏[1] 郭斌斌 易见兵[1] 陈志刚[2] MAO Yimin;GUO Binbin;YI Jianbing;CHEN Zhigang(School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,China;College of Computer Science and Engineering,Central South University,Changsha 410083,China)

机构地区:[1]江西理工大学信息工程学院,江西赣州341000 [2]中南大学计算机学院,湖南长沙410083

出  处:《计算机集成制造系统》2024年第9期3183-3198,共16页Computer Integrated Manufacturing Systems

基  金:国家自然科学基金资助项目(41562019);科技创新2030-“新一代人工智能”重大资助项目(2020AAA0109605)。

摘  要:针对大数据环境下并行支持向量机(SVM)算法存在子集分布偏差大,并行效率低以及过滤非支持向量不准确等问题,提出了基于相对熵和余弦相似度的并行SVM算法(RC-PSVM)。该算法首先提出基于相对熵的数据划分策略(DPRE),平衡当前子集和原始数据集的相对熵,划分样本到适合的子集,降低子集分布偏差;然后提出基于余弦相似度的冗余层级检测策略(CS-RLDS),计算相邻层局部SVM之间法向量的余弦相似度,比较设定的阈值与相似度,识别并停止冗余层级,提高了并行效率;最后提出非支持向量过滤策略(NSVF),结合样本到多个局部支持向量模型决策边界的距离,计算支持向量相似度来识别非支持向量,解决了过滤非支持向量不准确的问题。实验表明,RC-PSVM算法的分类效果更佳,且在大数据下的运行效率更高。Aiming at the problems of parallel support vector machine algorithm in big data environment such as large subset distribution deviation,low parallel efficiency and inaccurate filtering of non-support vector,a parallel support vector machine algorithm based on relative entropy and cosine similarity Parallel Support Vector Machine algorithm based on Relative Entropy and Cosine Similarity(RC-PSVM)was proposed.A data partitioning Data Partitioning based on Relative Entropy(DPRE)strategy based on relative entropy was proposed,which balanced the relative entropy of the current subset and the original data set,and divided the sample into a suitable subset to reduce the deviation of the subset distribution.Then,Redundancy Level Detection Strategy based on Cosine Similarity(CS-RLDS)was designed to calculate the cosine similarity of normal vectors between adjacent layer local support vector machines via comparing the set threshold and similarity to identify and stop the redundancy level,which improved the parallel efficiency.Finally,the Non-Support Vector Filtering strategy(NSVF)was developed,which calculated the support vector similarity by combining the distance between the sample and the decision boundaries of multiple local support vector models to identify Non-support vector to solve the problem of inaccurate filtering of non-support vector.Experiments showed that the classification effect of the RC-PSVM algorithm was better,and the operation was more efficient under big data.

关 键 词:大数据 MAPREDUCE框架 并行支持向量机 相对熵 余弦相似度 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论] TP181[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象