基于农业网络信息分类的热词自动提取方法  被引量:10

Automatic Extraction Method of Hot Words Based on Agricultural Network Information Classification

在线阅读下载全文

作  者:段青玲[1] 张璐[1] 刘怡然[1] 王沙沙 DUAN Qingling;ZHANG Lu;LIU Yiran;WANG Shasha(College of Information and Electrical Engineering, China Agricultural University, Beijiag 100083, China;Agricultural Information Technology Limited Liability Company of Beijing, Beijing 100081, China)

机构地区:[1]中国农业大学信息与电气工程学院,北京100083 [2]北京农信通科技有限责任公司,北京100081

出  处:《农业机械学报》2018年第7期160-167,共8页Transactions of the Chinese Society for Agricultural Machinery

基  金:国家高技术研究发展计划(863计划)项目(2013AA102306);"十二五"国家科技支撑计划项目(2012BAD35B06)

摘  要:热词提取对于监控和分析农业舆情具有重要意义,目前已有一定研究基础,但仍存在针对性差等问题,无法满足农业领域不同产业用户群的个性化需求,为此,提出一种基于农业网络信息分类的热词自动提取方法。首先采用多标记分类算法对文本语料进行分类,按分类类别构建语料库,然后采用基于信息熵的方法对每个类别分别提取热词候选词,最后采用基于时间变化的方法进行候选词热度计算,根据候选词热度排序结果得到热词。本文抽取农业网站上的15 354条文本进行实验,结果表明,热词提取准确率达到0.9以上,能够较高质量地提取农业热词,为不同农业用户群体发现和分析产业热点提供帮助。With the vigorous development of the Internet, the network information grows rapidly, so does the agricultural network information. Extracting hot words from massive information is of great significance for monitoring and analyzing agricultural public opinion. Up to now, there is some research on hot words extraction, but there are still many problems such as poor pertinence. Existing hot word extraction methods cannot meet the personalized needs of users in different industries in agriculture. Therefore, a method of automatically extracting hot words based on agricultural network information classification was proposed. Firstly, the texts were classified by using the multi-label classification algorithm and multiple corpuses were built according to the classification categories. Secondly, the hot word candidates for each category were extracted by using the method based on information entropy. Thirdly, the heat of each hot word candidate was calculated by using the method based on time variation. Finally, these candidates were sorted by heat degree, and hot words were got according to the sorting results. Totally 15354 texts from agricultural websites were extracted for the experiment, automatically obtaining the hot words in the specified time period. The experiment results showed that the accuracy was over 0.9. It proved that the proposed method can extract agricultural hot words with high quality and help different agricultural user groups find and analyze the hot spot information of the industry.

关 键 词:农业网络信息 农业舆情监测 热词 多标记分类 热度计算 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象