机构地区:[1]人工智能与数字经济广东省实验室(深圳),广东深圳518107 [2]深圳大学计算机与软件学院,广东深圳518060
出 处:《计算机科学》2025年第3期137-151,共15页Computer Science
基 金:广东省自然科学基金面上项目(2023A1515011667);深圳市基础研究重点项目(JCYJ20220818100205012);深圳市基础研究面上项目(JCYJ20210324093609026);深圳市科技重大专项项目(202302D074)。
摘 要:朴素贝叶斯分类器被誉为机器学习领域的十大经典算法之一,其以完备的理论基础和简单的模型结构而闻名,在许多的实际应用中取得了良好的分类效果。然而条件属性独立性假设在一定程度上限制了朴素贝叶斯分类器的性能,因此大量的改进工作被提出来缓解这一问题,加权朴素贝叶斯分类器便是其中之一。在对边缘概率权重作用深入分析的基础之上,文中提出了一种基于风险最小化的加权朴素贝叶斯分类器(Risk Minimization-Based Weighted Naive Bayesian Classifier,RM-WNBC),即在权重确定的过程中同时考虑分类器的经验风险和权重的结构风险。不同于现有的过分关注朴素贝叶斯分类器外在泛化性能的改进策略,RM-WNBC是从朴素贝叶斯分类器的内在概率分布出发改善其泛化性能。经验风险度量了加权朴素贝叶斯分类器的分类能力,采用后验概率的估计质量表示;结构风险刻画了加权朴素贝叶斯分类器对属性相关性的处理,采用类条件概率的均方差表示。经验风险最小化保证了RM-WNBC可以获得良好的训练精度,同时结构风险最小化又使得RM-WNBC能够取得最佳的属性相关表达能力。为了获得RM-WNBC的最优权重,推导了高效且收敛的权重更新策略来保证结构风险和经验风险的最小化。在31个UCI和KEEL标准分类数据集上对RM-WNBC的可行性、合理性和有效性进行了验证。实验结果表明:1)RM-WNBC的训练和测试精度随着边缘概率权重的不断更新逐渐增加直至收敛;2)RM-WNBC具有比现有加权朴素贝叶斯分类器更好的属性相关性表达能力;3)在给定的显著性水平下,RM-WNBC在31个数据集上能够获得比经典朴素贝叶斯分类器、3种贝叶斯网络、4种加权朴素贝叶斯分类器和1种特征选择朴素贝叶斯分类器更好的训练和测试表现。Naive Bayesian classifier(NBC),which is famous for its sound theoretical basis and simple model structure,is a classical classification algorithm which has been deemed as one of the top 10 algorithms in the fields of data mining and machine lear-ning.However,the dependence assumption of NBC limits its prediction performance when attribute dependence exists.Weighted NBC(WNBC)is an improved version of NBC,which has good generalization performance and low training complexity.This paper proposes a risk minimization-based WNBC(RM-WNBC)by considering both empirical risk and structural risk,in which the empirical risk measures the classification performance of RM-WNBC and structural risk depicts the dependence expression capability of RM-WNBC.Unlike existing improvements to NBC,RM-WNBC alleviates the dependence assumption and further enhances the generalization capability of NBC by considering with the internal characteristics of NBC rather than its external characteristics.The empirical risk is represented by the estimation quality of posterior probabilities,while the structural risk is represented by the mean squared error of joint probabilities.The minimization of empirical risk and structural risk guarantees that RM-WNBC can achieve both good classification performance and appropriate dependence representation.To obtain the optimal weights of marginal probabilities,an efficient and convergent updating strategy is designed by minimizing the empirical and structural risks.A series of persuasive experiments is conducted to validate the feasibility,rationality and effectiveness of RM-WNBC on 31 benchmark data sets.The experimental results show that the optimization process of RM-WNBC weights is convergent and RM-WNBC not only well deals with the attribute dependence but also obtains better training and testing accuracies than the classical NBC,three typical Bayesian networks,four WNBCs and feature selection-based NBC.
关 键 词:朴素贝叶斯 独立性假设 加权朴素贝叶斯 结构风险 经验风险 贝叶斯网络
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...