基于LDA与snowNLP的学术造假事件主题挖掘与情感分析  

Topic Mining and Sentiment Analysis of Academic Misconduct Events Based on LDA and snowNLP

在线阅读下载全文

作  者:王智迪 WANG Zhidi(Library,Jilin Animation Institute,Changchun 130012,China)

机构地区:[1]吉林动画学院图书馆,吉林长春130012

出  处:《软件导刊》2025年第3期92-98,共7页Software Guide

基  金:吉林省教育厅高校思想政治工作专项(JJKH20241447SZ)。

摘  要:针对“华中农业大学11名学生联名举报黄飞若教授学术造假”事件的B站视频评论展开主题挖掘和情感分析。数据采集部分利用Python爬虫Requests库从B站获取4405条评论数据,并对这些评论数据进行文本预处理,包括Jieba分词、去停用词、自定义词典、TF-IDF文本向量化等工作。采用LDA主题模型对评论数据进行主题建模,通过困惑度和pyLDAvis可视化结果确定了最佳主题数量为15,并识别出热点主题。将主题挖掘结果划分为3个数据集,使用snowNLP库进行情感分析,将每条评论划分为积极、消极和中性情感类别。通过对情感标注后的评论数据进行统计分析,可视化展示不同情感类别的评论数量和情感倾向分布。研究结果显示,主题挖掘揭示了多个热点主题,涉及正义与努力、公众对学术造假的关注、理性与勇气等。而情感分析揭示了公众对于事件的评论呈现出积极、消极、中立等多样性情感倾向,其中多数评论对学生举报行为表示支持和赞扬,同时也存在对学术环境的担忧和愤怒等复杂情感。但是,研究也存在一定局限性,未来可进一步扩大研究范围,包括更多社交媒体平台的数据采集和分析,以获取更全面的公众意见和情感反馈。同时,可以结合定量和定性方法,深入探讨学术造假事件背后的深层原因和影响机制。This study conducts topic mining and sentiment analysis on Bilibili video comments related to the event of students from Huazhong Agricultural University jointly reporting Professor Huang Ruofei for academic misconduct.Data collection involves scraping 4405 comments from Bilibili using Python's Requests library,followed by text preprocessing steps such as segmentation with jieba,stop word removal,custom dictionary creation,and TF-IDF text vectorization.The Latent Dirichlet Allocation(LDA)topic model is utilized to model the comments,determining the optimal number of topics as through perplexity and pyLDAvis visualization results,and identifying hot topics.The results of topic mining are categorized into three datasets,and sentiment analysis is performed using the snowNLP library to classify each comment into positive,negative,or neutral sentiment categories.Statistical analysis and visualization showcase the distribution of different sentiment categories and tendencies.The study reveals multiple hot topics including justice and effort,public concern over academic misconduct,and rationality and courage.The sentiment analysis uncovers diverse sentiment tendencies among the public,with the majority supporting and praising the students'reporting behavior while expressing concerns and anger about the academic environment.However,the study acknowledges certain limitations and suggests future research expand the scope by collecting and analyzing data from more social media platforms to obtain a more comprehensive understanding of public opinions and sentiment feedback.Additionally,integrating quantitative and qualitative methods could further explore the underlying reasons and mechanisms behind academic misconduct events.

关 键 词:学术造假 LDA snowNLP 主题挖掘 情感分析 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象