基于全路径相似度的大规模层次分类算法  

Large scale hierarchical classification algorithm based on full-path similarity

在线阅读下载全文

作  者:朱建林[1] 陈忠阳[1] 张永俊[2] 孙存一 ZHU Jian-lin;CHEN Zhong-yang;ZHANG Yong-jun;SUN Cun-yi(School of Finance,Renmin University of China,Beijing 100872,China;School of Information,Renmin University ofChina,Beijing 100872,China;Guanghua School of Management,Peking University,Beijing 100871,China)

机构地区:[1]中国人民大学财政金融学院,北京100872 [2]中国人民大学信息学院,北京100872 [3]北京大学光华管理学院,北京100871

出  处:《计算机工程与设计》2019年第5期1300-1304,1333,共6页Computer Engineering and Design

基  金:国家自然科学基金项目(71271209);北京市自然科学基金项目(4132067)

摘  要:为快速准确地实现大规模层次分类问题,提出词类区分度概念,并以此作为计算类向量的基础。基于类向量,以改进的Rocchio算法计算待分类文本与目标类的相似度,候选出N个最可能的目标类别;根据目标类别的层次拓扑结构,计算待分类文本与N个目标类别的全路径相似度,确定分类类别。实验结果表明,该方法分类效果优于传统算法,其基于文本类全路径相似度的策略明显改善了单纯基于词类区分度的分类算法。A large-scale hierarchical classification algorithm based on full-path similarity was proposed.The concept of word-class discrimination was proposed and used as the basis of class vector.An improved Rocchio algorithm was proposed to calculate text-class similarity and N most likely classes were selected as candidates.The full-path similarities between text and candidate classes were calculated according to hierarchical structure of classes,by which the classification results were determined.The time complexity of the algorithm was linearly correlated with the number of classes.Experimental results show that the effects of the algorithm are better.

关 键 词:词类区分度 全路径相似度 大规模层次分类 文本分类 化繁为简策略 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象