检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:朱建林[1] 陈忠阳[1] 张永俊[2] 孙存一 ZHU Jian-lin;CHEN Zhong-yang;ZHANG Yong-jun;SUN Cun-yi(School of Finance,Renmin University of China,Beijing 100872,China;School of Information,Renmin University ofChina,Beijing 100872,China;Guanghua School of Management,Peking University,Beijing 100871,China)
机构地区:[1]中国人民大学财政金融学院,北京100872 [2]中国人民大学信息学院,北京100872 [3]北京大学光华管理学院,北京100871
出 处:《计算机工程与设计》2019年第5期1300-1304,1333,共6页Computer Engineering and Design
基 金:国家自然科学基金项目(71271209);北京市自然科学基金项目(4132067)
摘 要:为快速准确地实现大规模层次分类问题,提出词类区分度概念,并以此作为计算类向量的基础。基于类向量,以改进的Rocchio算法计算待分类文本与目标类的相似度,候选出N个最可能的目标类别;根据目标类别的层次拓扑结构,计算待分类文本与N个目标类别的全路径相似度,确定分类类别。实验结果表明,该方法分类效果优于传统算法,其基于文本类全路径相似度的策略明显改善了单纯基于词类区分度的分类算法。A large-scale hierarchical classification algorithm based on full-path similarity was proposed.The concept of word-class discrimination was proposed and used as the basis of class vector.An improved Rocchio algorithm was proposed to calculate text-class similarity and N most likely classes were selected as candidates.The full-path similarities between text and candidate classes were calculated according to hierarchical structure of classes,by which the classification results were determined.The time complexity of the algorithm was linearly correlated with the number of classes.Experimental results show that the effects of the algorithm are better.
关 键 词:词类区分度 全路径相似度 大规模层次分类 文本分类 化繁为简策略
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.163.22