基于误分类修正的朴素贝叶斯分类器及其在政务热线行业分类中的应用  

A Modified Naive Bayes Classifier Based on Misclassification with Application to Industry Classification of Public Hotline

在线阅读下载全文

作  者:官国宇 杨皓翔 王运豪 郝立柱[2] GUAN Guo-yu;YANG Hao-xiang;WANG Yun-hao;HAO Li-zhu(School of Economics and Management,Northeast Normal University,Changchun 130117,China;Key Laboratory for Applied Statistics of MOE,School of Mathematics and Statistics,Northeast Normal University,Changchun 130024,China)

机构地区:[1]东北师范大学经济与管理学院,吉林长春130117 [2]东北师范大学数学与统计学院,应用统计教育部重点实验室,吉林长春130024

出  处:《数理统计与管理》2025年第1期179-190,共12页Journal of Applied Statistics and Management

基  金:国家社会科学基金(19CTJ013)。

摘  要:传统统计分类方法应用于政务热线行业文本分类问题时存在一定系统性偏差。为了修正系统性偏差,进而减少由误分类导致的额外人力和时间成本,本文将朴素贝叶斯模型作为基准分类器,在最大后验概率判别准则中引入修正系数,并基于验证集上的误分类结果对修正系数进行学习,将其应用于政务热线的行业文本分类中。实证结果表明,修正后分类器的分类精确度比基准分类器提升了至少1个百分点,使误分类样本量减少了4个百分点。由于政务热线的文本工单数量庞大,故该方法对提升行政服务效率,降低人力资源成本具有积极意义。When traditional statistical classification methods are applied to document classification for the Public Hotline,certain systematic biases become evident.To mitigate these biases and reduce the extra labor and time costs resulting from misclassification,the naive Bayes model is used as the benchmark classifier,and the modified coefficients are developed to adjust the discriminant rule of maximum a posteriori probability,which are learned from the misclassification results on the validation set.The empirical study results demonstrate that the modified classifier improves classification accuracy by at least 1%and reduces misclassified samples by 4%when compared to the benchmark classifier.In summary,this method enhances administrative service eficiency and saves on human resource costs,especially considering the substantial volume of text documents within the Public Hotline.

关 键 词:朴素贝叶斯 政务热线 文本分类 修正系数 

分 类 号:O212[理学—概率论与数理统计] O235[理学—数学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象