基于支持向量机与无监督聚类相结合的中文网页分类器  被引量:108

A Chinese Web Page Classifier Based on Support Vector Machine and Unsupervised Clustering

在线阅读下载全文

作  者:李晓黎[1] 刘继敏[1] 史忠植[1] 

机构地区:[1]中国科学院计算技术研究所,北京100080

出  处:《计算机学报》2001年第1期62-68,共7页Chinese Journal of Computers

基  金:国家自然科学基金!(6 980 30 10 );国家"八六三"高技术研究发展计划!(86 3-5 11-946 -0 10 )资助

摘  要:提出了一种将支持向量机与无监督聚类相结合的新分类算法 ,给出了一种新的网页表示方法并应用于网页分类问题 .该算法首先利用无监督聚类分别对训练集中正例和反例聚类 ,然后挑选一些例子训练 SVM并获得 SVM分类器 .任何网页可以通过比较其与聚类中心的距离决定采用无监督聚类方法或 SVM分类器进行分类 .该算法充分利用了 SVM准确率高与无监督聚类速度快的优点 .实验表明它不仅具有较高的训练效率 ,而且有很高的精确度 .This paper presents a new algorithm that combines Support Vector Machine (SVM) and unsupervised clustering. After analyzing the characteristics of web pages, it proposes a new vector representation of web pages and applies it to web page classification. Given a training set, the algorithm clusters positive and negative examples respectively by the unsupervised clustering algorithm (UC), which will produce a number of positive and negative centers. Then, it selects only some of the examples to input to SVM according to ISUC algorithm. At the end, it constructs a classifier through SVM learning. Any text can be classified by comparing the distance of clustering centers or by SVM. If the text nears one cluster center of a category and far away from all the cluster centers of other categories, UC can classify it rightly with high possibility, otherwise SVM is employed to decide the category it belongs. The algorithm utilizes the virtues of SVM and unsupervised clustering. The experiment shows that it not only improves training efficiency, but also has good precision.

关 键 词:支持向量机 无监督聚类 中文网页分类器 INTERNET 机器学习 

分 类 号:TP393.409[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象