检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]东华理工大学理学院,江西 南昌 [2]东华理工大学经济与管理学院,江西 南昌
出 处:《应用数学进展》2024年第6期2901-2911,共11页Advances in Applied Mathematics
摘 要:朴素贝叶斯算法具有简单高效的特点,被广泛应用于文本分类。方法要求属性之间满足条件独立性假设,然而该假设在现实中很难满足。同时,随着大数据时代到来,文本数据呈现非线性结构的特点,经典朴素贝叶斯算法拟合效果不高。为解决以上问题,本文提出了一种基于距离相关系数的局部实例加权朴素贝叶斯分类算法。首先,计算属性和类别的距离相关系数,并将其作为属性权重嵌入到文档距离测度中,构建一种新的距离度量方法;其次,测算训练样本和测试样本的距离,进行实例选择和实例加权,构建局部实例加权贝叶斯文本分类器;最后,利用WEKA平台上的15个文本数据集对算法性能进行实验比较。结果表明新提出的算法在分类精度上均优于三种经典的朴素贝叶斯文本分类器。Naive Bayes algorithm has the characteristics of simplicity and efficiency, and is widely used in text classification. The method requires the assumption of conditional independence between attributes, which is difficult to satisfy in reality. Meanwhile, with the advent of the big data era, text data exhibits non-linear structures, and the fitting effect of classical naive Bayesian algorithms is limited. To address these issues, a locally instance-weighted Naive Bayes classification algorithm based on distance correlation coefficient is proposed. Firstly, it calculates the distance correlation coefficient between attributes and classes, and embeds it as attribute weights into the document distance measure to construct a new distance measurement method. Secondly, it measures the distances between training samples and test samples, conducts instance selection and instance weighting, and constructs a locally instance-weighted Bayesian text classifier. Finally, the algorithm’s performance is experimentally compared with 15 text datasets from the WEKA platform. The results indicate that the proposed algorithm outperforms three classical Naive Bayes text classifiers in terms of classification accuracy.
关 键 词:文本分类 朴素贝叶斯 实例选择 实例加权 距离相关系数
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.216.67.249