基于TFIDF与分类树的工程文本信息分类法  被引量:3

ENGINEERING TEXT INFORMATION CLASSIFICATION BASED ON TFIDF AND CLASSIFICATION TREE

在线阅读下载全文

作  者:孔秋强[1] 贺前华[1] 

机构地区:[1]华南理工大学电子与信息学院,广东广州510640

出  处:《计算机应用与软件》2014年第6期174-176,191,共4页Computer Applications and Software

摘  要:针对传统的分类算法不能满足多层次的工程信息分类,提出一种基于词频逆文档频率TFIDF(term frequency inverse document frequency)和分类树的多层工程信息分类法。通过对每条工程信息生成多层分类树,在不同层次构建TFIDF矩阵,减少冗余计算。通过计算树结点中储存的相似度,进行判决得出分类结果。与传统单层分类算法相比,基于树的判决方法可以对类进行多级划分、多类属划分,且计算时间仅为单层分类的59%,并获得了95.1%的召回率和97.4%的准确率,具有很好的灵活性与鲁棒性。实验结果证实了算法的有效性。For traditional classification algorithms can' t satisfy the requirement of multilevel engineering information classification, we propose a multilayer engineering information classification method which is based on TFIDF ( term frequency inverse document frequency) and classification tree. The algorithm reduces redundant computation by creating multilevel classification tree on each engineering information piece and building TFIDF matrix at different levels. Through calculating the similarity stored in tree nodes, the algorithm gets the classification result by judging. Compared with traditional single-level classification algorithm, the tree-based judgement method can make multi-level classification and muhiple-generic division on classes. The computation time is only 59% of the single-level classification, and a recall rate of 95. 1% and accuracy of 97. 4% are obtained. It has good flexibility and robustness. Experimental results confirm the effectiveness of the algorithm.

关 键 词:信息分类 词频逆文档频率 分类树 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象