基于主动学习的文档分类  被引量:5

Active Learning Based Text Categorization

在线阅读下载全文

作  者:覃刚力[1] 黄科[2] 杨家本[1] 

机构地区:[1]清华大学自动化系,北京100084 [2]清华大学计算机系,北京100084

出  处:《计算机科学》2003年第10期45-48,共4页Computer Science

摘  要:1引言 随着Internet快速普及和发展,使得网络上的电子文档数量激增.用户在享受它所提供的大量信息的同时,也越来越感到被庞大复杂的信息所淹没.然而网络上的文档数据并不是被有组织地管理,而仅仅是一个大的无序数据集合.In the field of text categorization,the number of unlabeled documents is generally much gretaer than that of labeled documents. Text categorization is the problem of categorization in high-dimension vector space, and more training samples will generally improve the accuracy of text classifier. How to add the unlabeled documents of training set so as to expand training set is a valuable problem. The theory of active learning is introducted and applied to the field of text categorization in this paper,exploring the method of using unlabeled documents to improve the accuracy of text classifier. It is expected that such technology will improve text classifier's accuracy through adopting relatively large number of unlabelled documents samples. We brought forward an active learning based algorithm for text categorization,and the experiments on Reuters news corpus showed that when enough training samples available,it's effective for the algorithm to promote text classifier's accuracy through adopting unlabelled document samples.

关 键 词:机器学习 主动学习 文档分类算法 特征提取 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象