多维特征融合的虚假健康信息识别方法研究: 基于 LightGBM 算法  被引量:11

Research on Health Misinformation Identification Method Based on Multi-dimensional Feature Fusion:Based on LightGBM Algorithm

在线阅读下载全文

作  者:金燕 徐何贤 毕崇武 Jin Yan

机构地区:[1]郑州大学信息管理学院,河南郑州450001 [2]郑州市数据科学研究中心,河南郑州450001 [3]郑州大学政治与公共管理学院,河南郑州450001

出  处:《情报理论与实践》2023年第8期156-164,共9页Information Studies:Theory & Application

基  金:国家社会科学基金一般项目“群体参与视角下在线健康信息质量治理研究”的成果之一,项目编号:21BTQ054。

摘  要:[目的/意义]为解决虚假健康信息自动识别效率低、准确度不高的问题,提出基于多特征融合的虚假健康信息识别方法。[方法/过程]首先,从内容特征、情感特征、发布者特征3个维度构建虚假健康信息特征指标体系;其次,分别采取不同方法进行特征提取,并转换成可处理的结构化数据;再次,基于LightGBM分类模型融合多特征属性,实现虚假健康信息自动识别;最后,以微信公众号上的健康信息为例进行实证验证。[结果/结论]该方法在微信公众号数据集实验的准确率达到92.22%,判别效果优于基于内容、情感、发布者等单维特征的识别方法,能够在一定程度上解决人工识别存在的及时性差、效率低、数量有限等问题,能够更全面、更接近人工识别准确率地实现虚假健康信息自动化识别。[Purpose/significance]In order to solve the problem of low efficiency and low accuracy of automatic identification of health misinformation,a health misinformation identification method based on multi-dimensional features fusion is proposed.[Method/process]Firstly,the health misinformation characteristic indicator system is constructed from three dimensions:content characteristics,emotional characteristics,and publisher characteristics.Secondly,different methods are adopted to extract features and convert features into structured data that can be processed.Then,based on the LightGBM classification model,multi-dimensional feature attributes are fused to realize the automatic identification of health misinformation.Finally,take the health information on the WeChat public account as an example for empirical verification.[Result/conclusion]The accuracy rate of this method in the WeChat public account dataset experiment reaches 92.22%,and the discrimination effect is better than the identification method based on single-dimensional features such as content,emotion,and publisher,which can solve the problems of poor timeliness,low efficiency and limited number of manual identification to a certain extent,and realize the automatic identification of health misinformation more comprehensively and closer to the accuracy of manual identification.

关 键 词:多维特征 特征融合 虚假健康信息 LightGBM 识别方法 信息治理 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术] TP309[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象