New text classification algorithm based on interdependence and equivalent radius  

基于互依赖和等效半径的文本分类方法(英文)

在线阅读下载全文

作  者:王洪伟[1] 伊磊[2] 王建会[3] 

机构地区:[1]同济大学经济与管理学院,上海200092 [2]复旦大学数学科学学院,上海200433 [3]复旦大学信息科学与工程学院,上海200433

出  处:《Journal of Southeast University(English Edition)》2007年第1期63-69,共7页东南大学学报(英文版)

基  金:The National Natural Science Foundation of China(No70501024,70501022);the Humanity & Social Science ResearchProgram of Ministry of Education of China (No05JC870013)

摘  要:To improve the traditional classifying methods, such as vector space model (VSM)-based methods with highly complicated computation and poor scalability, a new classifying method ( called IER) is presented based on two new concepts: interdependence and equivalent radius. In IER, the attribute is selected according to the value of interdependence, and the classifying rule is based on equivalent radius and center of gravity. The algorithm analysis shows that IER is good at classifying a large number of samples with higher scalability and lower computation complexity. After several experiments in classifying Chinese texts, the conclusion is drawn that IER outperforms k-nearest neighbor (kNN) and classifcation based on the center of classes (CCC) methods, so IER can be used online to automatically classify a large number of samples while keeping higher precision and recall.为了解决传统分类方法计算复杂度高及可扩展性差的问题,提出了互依赖和等效半径的概念,并将两者相结合,提出新的分类算法——基于互依赖和等效半径、易更新的分类算法IER.IER算法根据互依赖作为特征选择的量度,通过较长特征值的选择降低维度,通过重心和等效半径来建立分类模型.算法分析显示IER计算复杂度较低,扩展性能较好,适用于大规模场合.将IER算法应用于中文文本分类,并与kNN算法和类中心向量法进行比较,结果表明,在提高分类精度的同时,IER还可以大幅度提高分类速度,有利于对大规模信息样本进行实时在线的自动分类.

关 键 词:CLASSIFICATION equivalent radius vector space INTERDEPENDENCE interdependence and equivalent radius 

分 类 号:TP139[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象