网络舆情信息提取技术研究与实现  被引量:4

Research and Implementation of Information Extraction Technology in Network Public Opinion

在线阅读下载全文

作  者:刘华春[1] 王星捷[1] 

机构地区:[1]成都理工大学工程技术学院,四川乐山614007

出  处:《计算机技术与发展》2016年第9期8-11,共4页Computer Technology and Development

基  金:四川省自然科学重点项目(A22012003);四川省乐山市科技局重点项目(14GZD050)

摘  要:网络舆情信息提取是舆情分析系统中最为关键的部分,是实现舆情分析、舆情统计的数据基础。为此,设计和实现了一个基于话题线索的舆情信息提取方案。该方案将舆情页面以话题为线索进行逻辑划分;采用基于DOM树的广度优先搜索方法,设计了舆情信息提取算法;通过设置最低重复话题阈值θ,用户定制提取格式,信息去重去噪措施,实现了舆情信息的有效提取。通过对多个论坛舆情信息的提取实验,结果表明,所设计的方案有很好的提取性能,召回率、正确率、F指数都较高,能够很好地提取出论坛、评论等舆情信息。Intemet public opinion information extraction is the most critical part of public opinion analysis system, which is also a data base of the public opinion analysis and statistics. For this reason,a public opinion information extraction method based on clues topic is designed and implemented. In the method,pages of public opinion as one topic clue is divided to logical region, and the breadth-first search methods based on DOM tree is applied to design extraction algorithm of public opinion information. By setting a minimum repeat topic threshold θ ,customized extraction format, removed duplicate and noise of information, public opinion extraction is realized effec- tively. By experiment of the public opinion of multiple forums,the results show that this scheme has good extract performance,and the re- call, the correct rate and F measure are higher, which is .able to well extract forum and reviews and other public opinion information.

关 键 词:舆情信息 WEB信息提取 话题线索 DOC树 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象