基于聚类的林业病虫害实体抽取研究  被引量:2

ON CLUSTERING-BASED FORESTRY PEST AND DISEASES ENTITY EXTRACTION

在线阅读下载全文

作  者:毛浪[1] 赵传钢[1] 

机构地区:[1]北京林业大学信息学院,北京100083

出  处:《计算机应用与软件》2015年第3期37-40,64,共5页Computer Applications and Software

摘  要:在基于半监督和主动学习的信息抽取研究中,对初始样本集的选择,鲜有考虑样本在数据集中的分布情况。以林业领域的病虫害抽取为例,提出基于聚类的方法来获取样本在数据集中的分布信息,以此指导初始样本集和迭代过程中标注样本的选择。实验结果表明,基于聚类的方法相比于随机初始训练集,在不同标注样本集个数的情况下,模型f值均有提高。相比于单一的主动学习方法,在性能相近的情况下,节约了30%左右的人工标注量。In study of semi-supervised learning and active learning based information extraction,it is scarcely to consider the distribution condition of samples in dataset when selecting the initial sample set. Taking pest and diseases extraction in forestry field as example,we propose the clustering-based method to obtain distribution information of samples in dataset,and use it to guide the selection of initial training set and the annotated samples in iteration process. Experimental results demonstrate that compared with the random initial training set,the clustering-based method improves f values of the model under the condition of that the numbers of annotated sample set are different. And compared with the single active learning method,the manual annotation amount is also saved about 30% under the condition of similar performance.

关 键 词:信息抽取 文本聚类 林业病虫害实体 主动学习 半监督学习 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象