基于模糊加权近似支持向量机的Web文本分类  被引量:2

WEB TEXT CLASSIFICATION ALGORITHM BASED ON FUZZY WEIGHTED PROXIMAL SUPPORT VECTOR MACHINE

在线阅读下载全文

作  者:王平[1] 吴剑[1] 

机构地区:[1]南昌大学信息工程学院,江西南昌330031

出  处:《计算机应用与软件》2015年第5期54-58,共5页Computer Applications and Software

基  金:江西省科技支撑计划项目(2009BGB01900);江西省自然科学基金项目(2009JX02367)

摘  要:Web文本分类是数据挖掘领域的研究热点。针对Web文本数据集高维和不平衡的特点,将模糊隶属度和平衡因子引入近似支持向量机,提出模糊加权近似支持向量机。首先计算样本的平均密度,并结合样本数量求得平衡因子,克服传统加权算法仅以样本数为依据设置权值的缺陷,缓解数据不平衡造成的分类超平面偏移;再计算样本的模糊隶属度,消除噪声和奇异点造成的分类误差;近似支持向量机相比标准支持向量机具有明显的速度优势,更加适用于高维数据分类。实验表明,算法能有效提高不平衡数据的分类精度,在Web文本的训练速度和分类质量上有一定提高。Web text classification is a hot topic in data mining field. In light of the high-dimension and imbalance features of Web text data, we propose in this paper the fuzzy weighted proximal support vector machine (FWPSVM) which introduces fuzzy membership and balance factor to PSVM. First, it calculates the average density of samples, and seeks the balance factor in combination with samples' num- ber and overcomes the defect of traditional weighted algorithms that it sets the weighting value only based on samples' number, thus mitigates the offset of the classification hyperplane caused by the imbalanced data. Then it calculates the fuzzy membership of samples in order to elimi- nate the classification error incurred from noise and singular point. The PSVM has noticeable advantage in speed compared with standard SVM, and is more suitable for high-dimension data classification. Experiments indicate that the proposed algorithm can effectively improve the classification accuracy of imbalanced data, and makes certain improvement on Web text training speed and classification quality.

关 键 词:文本分类 近似支持向量机 模糊隶属度 平衡因子 不平衡数据 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象