一种基于业务词典的精准主题挖掘解决方案

An Accurate Topic Mining Solution Based on Business Dictionary

作　　者：杨志[1] 林峰[1] 胡牧[1] 孟庆强[1] 郑浩泉[1] YANG Zhi;LIN Feng;HU Mu;MENG Qingqiang;ZHENG Haoquan(NARI Group Corporation/State Grid Elective Power Science Research Institute,Nanjing 210003)

机构地区：[1]南京南瑞集团公司/国网电力科学研究院,南京210003

出　　处：《计算机与数字工程》2018年第8期1697-1702,共6页Computer & Digital Engineering

基　　金：南京南瑞集团公司科技项目:智能电网生产调度领域大数据应用研究(编号:524606160204)资助

摘　　要：文本挖掘是数据挖掘的一个重要研究方向。许多科研机构和科研团队提出了通用有价值的文本挖掘算法。但由于行业和场景的差异,很难用通用的数据分析算法准确挖掘出电力行业日志数据的潜在价值。例如,在电力一个故障场景中,很难找到与指定主题语义相关的词。针对这一问题,论文提出了一种基于业务词典的精确主题挖掘解决方案。该方案中,首先针对电力行业和特定场景创建业务词典,在预处理后的文档集中借助于业务词典进行热词分析,最后对指定的主题词集合进行语义关联分析。该方案已经在PMS的故障日志中进行了验证,结果显示主题词相关因素分析准确有效,提高了故障分析效率。The text mining is an important branch of data mining. Many scientific research institutions and teams are actively exploring and putting forward algorithms. Because of industry and scene difference,it is difficult to use the common analysis algorithm of log to mine the potential information accurately. For example,a topic is given in one scene,how to find the main related words is not easy. To deal with the problem,this paper provides the accurate topic mining solution based on business dictionary. In the algorithm,firstly,the business dictionary is created by business expert. Then,segmenting with business dictionary is achieved in the document set. In this step,the document set is split into professional terms and the hot words analysis is achieved. Finally,with the relevance index,the relevance degree of every word is computed. The relevance matrix is used to analyze the sematic association with the topic mining. The solution has been applied to PMS and the validation result shows the main related factors can be analyzed accurately.

关键词：文本挖掘知识发现主题挖掘基于业务词典的主题挖掘影响因子

分类号：TP393[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于业务词典的精准主题挖掘解决方案

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于业务词典的精准主题挖掘解决方案

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索