基于朴素贝叶斯算法的城市安全舆情信息智能分类研究  

Research on intelligent classification of urban security public opinion information based on Naive Bayes algorithm

在线阅读下载全文

作  者:丁嘉伟 路美松 伍彬彬[1] 王超 钟天宇 张林 DING Jiawei;LU Meisong;WU Binbin;WANG Chao;ZHONG Tianyu;ZHANG Lin(China Academy of Safety Science and Technology,Beijing 100012,China;Shenzhen Metro Construction Group Co.,Ltd.,Shenzhen Guangdong 518031,China)

机构地区:[1]中国安全生产科学研究院,北京100012 [2]深圳地铁建设集团有限公司,广东深圳518031

出  处:《中国安全生产科学技术》2024年第S1期262-266,共5页Journal of Safety Science and Technology

摘  要:为了解决城市安全舆情信息数据冗杂、无法有效提取有效信息的问题,提出一种基于朴素贝叶斯算法的城市安全舆情信息智能分类方法。该方法针对城市安全舆情信息中充斥的大量噪音及一般文本较短的特征,选择朴素贝叶斯算法进行训练。在训练前,先对文本进行分词、去高频词、去停用词,并转化为TF-IDF向量,以提高模型的分类质量;然后对数据库数据进行筛选,以剔除噪音数据,实现智能分类。研究结果表明:本文提出的方法能够有效剔除噪音数据,提升城市安全舆情信息的智能分类效果。研究结果可为后续的数据挖掘工作提供有效支持,并显著提高数据挖掘的效率和准确性。Addressing the issue of data redundancy and inefficiency in extracting valuable information from urban security public sentiment data,this paper proposes an intelligent classification method for urban security public opinion information based on Naive Bayes algorithm.The method adopts the Naive Bayes algorithm for training,considering the characteristics of urban security public opiniondata that are typically overwhelmed with noise and consist of brief texts.To improve the categorization quality of the model,a preprocessing phase is implemented prior to training,comprising text segmentation,removal of high⁃frequency words,elimination of stop words,and transformation into TF-IDF vectors.Subsequently,database records undergo screening using this refined model,effectively removenoisydata and enabling the smart categorization of urban security public opinioninformation.The results show that the method proposed in this paper can effectively remove noisy data and improve the smartcategorization effect of urban security public opinion information.The research results can provide effective support for subsequent data mining work and significantly improve the efficiency and accuracy of data mining.

关 键 词:城市安全舆情信息 朴素贝叶斯 文本分类 

分 类 号:G206[文化科学—传播学] TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象