基于LDA模型的交互式文本主题挖掘研究——以客服聊天记录为例  被引量:9

Interactive Text Theme Mining Based on LDA Model——Take Customer Service Chat as An Example

在线阅读下载全文

作  者:李莉[1] 林雨蓝 姚瑞波[2] LI Li1, LIN Yu-lan1, YAO Rui-bo2(1 .School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094,China 2.Focus Technology Co., Ltd. Nanjing, 210061, China)

机构地区:[1]南京理工大学经济管理学院,江苏南京210094 [2]焦点科技股份有限公司,江苏南京210061

出  处:《情报科学》2018年第10期64-70,共7页Information Science

基  金:国家自然科学基金资助课题(71771122;71271115)

摘  要:【目的/意义】挖掘出客服聊天记录中蕴含的主题,为客服自动问答系统的设计及优化提供指导方案。【方法/过程】本文针对保险网站客服聊天记录这类交互式短文本,利用会话切分、分词提取以及词汇过滤等方法进行文本预处理,通过名词短语提取、高频词汇提取以及外部数据集引入等方法进行特征选择,最终应用LDA建模方法获取交互式文本主题。【结果/结论】模型结果显示:用户主要关注保险详情、保险金额以及保险险种等主题。不同主题之间的话题具有一定的独立性,主题和话题之间存在很强的相关性。LDA模型结果成功挖掘出用户关注的主题,这为电子商务网站运营方进行自动问答系统的设计和优化提供了指导方案。【Purpose/significance】 The purpose of this paper is to study the topic contained in the customer service chat record, and to provide guidance for the design and optimization of the customer service QA system.【Method/process】This article focuses on interactive short texts for customer service chat records in insurance websites, to preprocess using session segmentation, word segmentation and word filtering methods, to carry out the feature selection on processing results, including extracting the noun phrase, extracting the high frequency vocabulary and introducing the external data set etc. Finally,the results are taken into the LDA model, and the results of the thematic model are obtained.【Result/conclusion】Model results show that the users focus mainly on insurance details, insurance coverage and so on.Topics between different topics have a certain independence, at the same time, there is a strong correlation between topic and topic. The LDA modeling results excavate the topic of user concern successfully. It provides guidance for the design and optimization of automated QA systems for e-commerce websites.

关 键 词:交互式文本 LDA模型 主题挖掘 

分 类 号:G254[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象