多类文本分类算法GS-SVDD  被引量:4

Multiclass Text Classification by Golden Selection and Support Vector Domain Description

在线阅读下载全文

作  者:吴德[1] 刘三阳[1] 梁锦锦[2] WU De LIU San-yang LIANG Jin-jin(School of Computer Science and Technology, Xidian University, Xi ' an 710071, China Faculty of Science, Xi ' an Shiyou University, Xi' an 710065, China)

机构地区:[1]西安电子科技大学计算机学院,西安710071 [2]西安石油大学理学院,西安710065

出  处:《计算机科学》2016年第8期190-193,共4页Computer Science

基  金:国家自然科学基金(61373174);陕西省教育厅自然科学基金(2010JK773);西安石油大学博士专项科研基金(Z10027)资助

摘  要:传统多类文本多分类算法存在计算量大和训练时间长的问题,为此利用黄金分割(Golden Selection,GS)和支持向量域描述(Support Vector Domain Description,SVDD)对多类文本构造一种分类算法。GS-SVDD首先利用词频逆向文件频率(Term Frequency-Inverse Document Frequency,TF-IDF)公式计算词条的相对词频,根据该值将词条降序排列,并对得到的文本向量进行归一化;其次采用黄金分割法对文本向量进行维数约简,使得冗余的样本特征数不超过一个;最后根据支持向量域描述进行多类分类,判断待测文本归属相对类距离之值较小的类。不同数据集的数值实验表明,GS-SVDD比"一对一"和"一对多"支持向量机具有更好的稳定性、更高的分类精度和更短的训练时间,从而更适用于海量文本的多分类。Traditional multiclass text classification methods have disadvantages such as large computation and long training time. An algorithm based on golden selection and support vector domain description (SVDD) was proposed for text classification. The proposed method utilizes TF-IDF formula to compute the relative word frequency for each entry, sorts them in descending order and normalizes the text vector. Then golden selection method is introduced for dimension reduction,where the number of redundant sample features is no more than one. Finally, SVDD is applied for classifica- tion, which assigns the test text to the class with the smallest value of the relative class distance. Numerical experiments on various datasets demonstrate that, the proposed method has better robustness, higher classification accuracy and less training time, compared with "one-against-one" and "one-against-all" support vector machine. It is more appropriate for huge text multi-classification problems.

关 键 词:文本多分类 黄金分割 支持向量域描述 维数约简 海量文本 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象