检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:梁丹凝[1] 梁坚 LIANG Dan-ning;LIANG Jian(College of Biomedical Information and Engineering,Hainan Medical University,Haikou Hainan 570000,China;School of Science,East China University of Technology,Nanchang Jiangxi 330013,China)
机构地区:[1]海南医学院生物医学信息与工程学院,海南海口570000 [2]东华理工大学理学院,江西南昌330013
出 处:《计算机仿真》2024年第6期477-480,497,共5页Computer Simulation
摘 要:不平衡数据的分类分级是保证大数据技术高效使用过程中不可缺少的环节,但分类分级过程易受数据属性、冗余性、不均衡性等问题的干扰。为解决上述问题,提出不平衡数据朴素贝叶斯分类分级算法。采用合成少数类过采样技术降低数据的不平衡度,通过距离相关系数与最大信息系数完成不平衡数据的特征选择与筛选,采用Relief算法对筛选的特征做权重分配,并输入到朴素贝叶斯模型中实现分类,再结合动态阈值算法完成数据的分级。实验结果表明,所提算法的运行时间短、分类精度高,能够有效提升数据处理效果。In general,the classification of imbalanced data is an indispensable step in ensuring the efficient use of big data technology.However,the classification process is easily disrupted by some issues such as data attributes,redundancy and imbalance.To address this,a hierarchical algorithm for imbalanced data based on naive Bayes was proposed.Firstly,the synthetic minority oversampling technique was employed to reduce the degree of data imbalance.Then,the distance coefficient and the maximum information coefficient were used to carry out the feature selection and screening of imbalanced data.Following this,the Relief algorithm was used to assign weights of the selected features,which were then input into the naive Bayes model for classification.Finally,a dynamic threshold algorithm was utilized to complete the data classification.Experimental results prove that the proposed algorithm has a short running time and high classification accuracy,thus effectively improving data processing performance.
关 键 词:不平衡度 距离相关系数 特征矩阵 权重分配 后验概率
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.188.250.166