基于BERTopic-Word2Vec的民航旅客评论主题挖掘  

Topic Mining of Civil Aviation Passenger Comments Based on BERTopic-Word2Vec

在线阅读下载全文

作  者:柯西军 文家俊 徐海文[1] 赵运祥 KE Xi-jun;WEN Jia-jun;XU Hai-wen;ZHAO Yun-xiang(School of Science,Civil Aviation Flight University of China,Deyang 618000,China;School of Computer Science,Civil Aviation Flight University of China,Deyang 618000,China)

机构地区:[1]中国民用航空飞行学院理学院,四川德阳618000 [2]中国民用航空飞行学院计算机学院,四川德阳618000

出  处:《数学的实践与认识》2025年第3期155-166,共12页Mathematics in Practice and Theory

基  金:国家自然科学基金面上项目(12371301)。

摘  要:挖掘旅客评论数据的主题和关键词,对航空公司制定相应的服务政策具有重要的指导意义。鉴于民航评论的独特性,需要针对性的建模进行分析。首先,设计反向翻译技术扩充数据集,并扩展THULAC分词词典与停用词表,以及TF-IDF特征提取,实现了对复杂评论的处理.其次,针对中文短文本特性,改进BERTopic技术,包括对其嵌入模型的微调、聚类函数的增强以及参数的精细调整,从而显著提升模型在主题划分及关键词间复杂关系上的能力.构建Word2Vec-TopN模型对主题和关键词两者向量分析,增强上下文连接能力。此外,考虑话题强度和中心性,核心主题的语义网络,分析话题与关键词间关系.最后,根据核心主题和关键词为导向的挖掘结果,分析各个主题中存在的突出问题,并提出改进建议.Mining the topics and keywords of passenger review data has important guiding significance for airlines to formulate corresponding service policies.Given the uniqueness of civil aviation reviews,targeted modeling is needed for analysis.Firstly,the back-translation technology was designed to expand the dataset,the THULAC word segmentation dictionary and stop word list were expanded,and the TF-IDF feature extraction was implemented to process complex reviews.According to the characteristics of Chinese short texts,the BERTopic technology is improved,including the fine-tuning of its embedding model,the enhancement of clustering function and the fine adjustment of parameters,so as to significantly improve the ability of the model in topic division and complex relationship between keywords.Secondly,the Word2Vec-TopN model was constructed to analyze the vector of topics and keywords to enhance the contextual connection ability.In addition,considering the topic strength and centrality,the semantic network of core topics was used to analyze the relationship between topics and keywords.Finally,according to the results of core topic and keyword-oriented mining,the outstanding problems in each topic are analyzed,and the improvement suggestions areput forward.

关 键 词:民航旅客 BERTopic 主题分析 Word2Vec 语义网络 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象