检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨志[1] 林峰[1] 胡牧[1] 孟庆强[1] 郑浩泉[1] YANG Zhi;LIN Feng;HU Mu;MENG Qingqiang;ZHENG Haoquan(NARI Group Corporation/State Grid Elective Power Science Research Institute,Nanjing 210003)
机构地区:[1]南京南瑞集团公司/国网电力科学研究院,南京210003
出 处:《计算机与数字工程》2018年第8期1697-1702,共6页Computer & Digital Engineering
基 金:南京南瑞集团公司科技项目:智能电网生产调度领域大数据应用研究(编号:524606160204)资助
摘 要:文本挖掘是数据挖掘的一个重要研究方向。许多科研机构和科研团队提出了通用有价值的文本挖掘算法。但由于行业和场景的差异,很难用通用的数据分析算法准确挖掘出电力行业日志数据的潜在价值。例如,在电力一个故障场景中,很难找到与指定主题语义相关的词。针对这一问题,论文提出了一种基于业务词典的精确主题挖掘解决方案。该方案中,首先针对电力行业和特定场景创建业务词典,在预处理后的文档集中借助于业务词典进行热词分析,最后对指定的主题词集合进行语义关联分析。该方案已经在PMS的故障日志中进行了验证,结果显示主题词相关因素分析准确有效,提高了故障分析效率。The text mining is an important branch of data mining. Many scientific research institutions and teams are actively exploring and putting forward algorithms. Because of industry and scene difference,it is difficult to use the common analysis algorithm of log to mine the potential information accurately. For example,a topic is given in one scene,how to find the main related words is not easy. To deal with the problem,this paper provides the accurate topic mining solution based on business dictionary. In the algorithm,firstly,the business dictionary is created by business expert. Then,segmenting with business dictionary is achieved in the document set. In this step,the document set is split into professional terms and the hot words analysis is achieved. Finally,with the relevance index,the relevance degree of every word is computed. The relevance matrix is used to analyze the sematic association with the topic mining. The solution has been applied to PMS and the validation result shows the main related factors can be analyzed accurately.
关 键 词:文本挖掘 知识发现 主题挖掘 基于业务词典的主题挖掘 影响因子
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15