基于互信息和融合加权的并行深度森林算法被引量：1

Parallel deep forest algorithm based on mutual information and mixed weighting

作　　者：毛伊敏[1,2] 李文豪 Mao Yimin;Li Wenhao(School of Information Engineering,Jiangxi University of Science&Technology,Ganzhou Jiangxi 341000,China;School of Information Engineering,Shaoguan University,Shaoguan Guangdong 512000,China)

机构地区：[1]江西理工大学信息工程学院,江西赣州341000 [2]韶关学院信息工程学院,广东韶关512000

出　　处：《计算机应用研究》2024年第2期473-481,共9页Application Research of Computers

基　　金：广东省重点领域研发计划资助项目(2022B0101020002);广东省重点提升项目(2022ZDJS048)。

摘　　要：针对大数据环境下并行深度森林算法中存在不相关及冗余特征过多、多粒度扫描不平衡、分类性能不足以及并行化效率低等问题,提出了基于互信息和融合加权的并行深度森林算法(parallel deep forest algorithm based on mutual information and mixed weighting,PDF-MIMW)。首先,在特征降维阶段提出了基于互信息的特征提取策略(feature extraction strategy based on mutual information,FE-MI),结合特征重要性、交互性和冗余性度量过滤原始特征,剔除过多的不相关和冗余特征;接着,在多粒度扫描阶段提出了基于填充的改进多粒度扫描策略(improved multi-granularity scanning strategy based on padding,IMGS-P),对精简后的特征进行填充并对窗口扫描后的子序列进行随机采样,保证多粒度扫描的平衡;其次,在级联森林构建阶段提出了并行子森林构建策略(sub-forest construction strategy based on mixed weighting,SFC-MW),结合Spark框架并行构建加权子森林,提升模型的分类性能;最后,在类向量合并阶段提出基于混合粒子群算法的负载均衡策略(load balancing strategy based on hybrid particle swarm optimization algorithm,LB-HPSO),优化Spark框架中任务节点的负载分配,降低类向量合并时的等待时长,提高模型的并行化效率。实验表明,PDF-MIMW算法的分类效果更佳,同时在大数据环境下的训练效率更高。In the context of big data environments,the parallel deep forest algorithm faces several challenges,such as an abundance of irrelevant and redundant features,imbalanced multi-granularity scanning,inadequate classification performance,and low parallelization efficiency.To tackle these issues,this paper proposed PDF-MIMW.Firstly,the algorithm introduced FE-MI in the phase of dimensionality reduction,which filtered the original feature set by combining feature importance,interaction,and redundancy metrics,thereby eliminating excessive irrelevant and redundant features.Next,the algorithm proposed an IMGS-P in the phase of multi-granularity scanning,which involved padding the reduced features and performing random sampling on the subsequences obtained after window scanning,thereby ensuring a balanced multi-granularity scanning process.Then,the algorithm put forth the SFC-MW in the phase of cascade forest construction,which utilized the Spark framework to parallelly construct weighted sub-forests,thereby enhancing the model’s classification performance.Finally,the algorithm designed a load balancing strategy based on a mixed particle swarm algorithm in the phase of class vector merging,which optimized the load distribution among task nodes in the Spark framework,reducing the waiting time during class vector merging and improving the parallelization efficiency of the model.Experiments demonstrate that the PDF-MIMW algorithm achieves superior classification performance and higher training efficiency in the big data environment.

关键词：Spark框架并行深度森林互信息负载均衡

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于互信息和融合加权的并行深度森林算法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于互信息和融合加权的并行深度森林算法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于互信息和融合加权的并行深度森林算法被引量：1