检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《计算机应用与软件》2015年第5期54-58,共5页Computer Applications and Software
基 金:江西省科技支撑计划项目(2009BGB01900);江西省自然科学基金项目(2009JX02367)
摘 要:Web文本分类是数据挖掘领域的研究热点。针对Web文本数据集高维和不平衡的特点,将模糊隶属度和平衡因子引入近似支持向量机,提出模糊加权近似支持向量机。首先计算样本的平均密度,并结合样本数量求得平衡因子,克服传统加权算法仅以样本数为依据设置权值的缺陷,缓解数据不平衡造成的分类超平面偏移;再计算样本的模糊隶属度,消除噪声和奇异点造成的分类误差;近似支持向量机相比标准支持向量机具有明显的速度优势,更加适用于高维数据分类。实验表明,算法能有效提高不平衡数据的分类精度,在Web文本的训练速度和分类质量上有一定提高。Web text classification is a hot topic in data mining field. In light of the high-dimension and imbalance features of Web text data, we propose in this paper the fuzzy weighted proximal support vector machine (FWPSVM) which introduces fuzzy membership and balance factor to PSVM. First, it calculates the average density of samples, and seeks the balance factor in combination with samples' num- ber and overcomes the defect of traditional weighted algorithms that it sets the weighting value only based on samples' number, thus mitigates the offset of the classification hyperplane caused by the imbalanced data. Then it calculates the fuzzy membership of samples in order to elimi- nate the classification error incurred from noise and singular point. The PSVM has noticeable advantage in speed compared with standard SVM, and is more suitable for high-dimension data classification. Experiments indicate that the proposed algorithm can effectively improve the classification accuracy of imbalanced data, and makes certain improvement on Web text training speed and classification quality.
关 键 词:文本分类 近似支持向量机 模糊隶属度 平衡因子 不平衡数据
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.117.172.41