基于随机森林模型的不平衡大数据分类算法  被引量:2

Unbalanced Big Data Classification Algorithm Based on Random Forest Model

在线阅读下载全文

作  者:魏亚明[1] 孟媛 WEI Yaming;MENG Yuan(Information Department,Xuzhou Central Hospital,Xuzhou 221000,China;Graduate School,Jiangsu Normal University,Xuzhou 221000,China)

机构地区:[1]徐州市中心医院信息处,江苏徐州221000 [2]江苏师范大学研究生院,江苏徐州221000

出  处:《吉林大学学报(信息科学版)》2023年第6期1079-1085,共7页Journal of Jilin University(Information Science Edition)

基  金:江苏省自然科学基金资助项目(BK2013573)。

摘  要:针对目前不平衡大数据分类算法分类效果较差的问题,提出基于随机森林模型的不平衡大数据分类算法。首先采用SVM(Support Vector Machine)支持向量机算法对不平衡大数据进行信息过滤,然后利用反k近邻法检测并消除离群点,通过增量主成分分析法去掉不平衡大数据中协方差矩阵存在的奇异性,并依据熵值法对其展开权重解析,进而提取不平衡大数据特征信息。将CART(Classification and Regression Trees)决策树当作不平衡大数据的基分类器,进而构建随机森林决策树分类器,最后将提取的不平衡大数据特征信息输入分类器中,实现不平衡大数据分类。实验结果表明,该算法对不平衡大数据的采样效果较好,并且分类精准度、稳定性和性能都较高。In response to the problem of poor classification performance faced by current imbalanced big data classification algorithms,a random forest model based imbalanced big data classification algorithm is proposed.Firstly,the SVM(Support Vector Machine) algorithm is used to filter information on imbalanced big data,and then the anti k-nearest neighbor method is used to detect and eliminate outliers.The singularity of the covariance matrix in imbalanced big data is removed through incremental principal component analysis.And based on the entropy method,weight analysis is carried out to extract imbalanced big data feature information.The CART(Classification and Regression Trees) decision tree is used as the base classifier for imbalanced big data,and a random forest decision tree classifier is constructed.The extracted imbalanced big data feature information is input into the classifier to achieve imbalanced big data classification.The experimental results show that the proposed algorithm has good sampling performance,high classification accuracy,high stability,and high performance for imbalanced big data.

关 键 词:随机森林模型 不平衡大数据分类 SVM支持向量机 反k近邻法 CART决策树 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象