检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]成都理工大学工程技术学院,四川乐山614007
出 处:《计算机技术与发展》2016年第9期8-11,共4页Computer Technology and Development
基 金:四川省自然科学重点项目(A22012003);四川省乐山市科技局重点项目(14GZD050)
摘 要:网络舆情信息提取是舆情分析系统中最为关键的部分,是实现舆情分析、舆情统计的数据基础。为此,设计和实现了一个基于话题线索的舆情信息提取方案。该方案将舆情页面以话题为线索进行逻辑划分;采用基于DOM树的广度优先搜索方法,设计了舆情信息提取算法;通过设置最低重复话题阈值θ,用户定制提取格式,信息去重去噪措施,实现了舆情信息的有效提取。通过对多个论坛舆情信息的提取实验,结果表明,所设计的方案有很好的提取性能,召回率、正确率、F指数都较高,能够很好地提取出论坛、评论等舆情信息。Intemet public opinion information extraction is the most critical part of public opinion analysis system, which is also a data base of the public opinion analysis and statistics. For this reason,a public opinion information extraction method based on clues topic is designed and implemented. In the method,pages of public opinion as one topic clue is divided to logical region, and the breadth-first search methods based on DOM tree is applied to design extraction algorithm of public opinion information. By setting a minimum repeat topic threshold θ ,customized extraction format, removed duplicate and noise of information, public opinion extraction is realized effec- tively. By experiment of the public opinion of multiple forums,the results show that this scheme has good extract performance,and the re- call, the correct rate and F measure are higher, which is .able to well extract forum and reviews and other public opinion information.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.14.79.99