基于wordNet的类别可拓展网页分类系统(英文)  被引量:1

WordNet based webpage classification system with category expansion

在线阅读下载全文

作  者:彭小刚[1] 明仲[1] 王海涛[1] 周景洲[1] 

机构地区:[1]深圳大学计算机与软件学院,深圳518060

出  处:《深圳大学学报(理工版)》2009年第2期116-120,共5页Journal of Shenzhen University(Science and Engineering)

基  金:国家自然科学基金资助项目(60673122);深圳市科技基金资助项目(200740)

摘  要:基于文本写作常采用一个意思由多个不同写法的单词来表述,研究词义文本分类法被用来替代使用关键词分类算法以提高分类准确率.分析wordNet内Synset架构,认为一个兼顾词义以及词义间关系的词义文本分类系统可应用到网页分类中.该系统同时注意到固定的文本类别结构以及结构内不断增长的文件数目间的区别,加入了基于类别信息聚类方法的类别拓展的功能.仿真实验证明,该分类系统与现有的基于语义的分类系统相比,在分类准确度性能上能提高13%.基于类别信息类聚的文本拓展功能与采用基于相似度的类聚方法的系统相比获得了一个质量更高的新增类别.Since different key words might be used to express the same meaning in text, many sense-based webpage classification algorithms have been presented to facilitate the process of retrieving online information instead of key- word based algorithms. A sense based webpage classification system using synsets in wordNet as well as the whole synset structure was developed to improve the classification accuracy. A category-based clustering algorithm for cate- gory expansion was also used in the system to address the problems caused by the conflict between the fixed number of categories and the growing number of documents added to the system. Experimental results show that the semantic hierarchy classification algorithm increases the classification accuracy by 13% compared with existing sense-based classification algorithms. The category-based clustering algorithm achieves a higher quality cluster than other existing methods that use similarity measure only.

关 键 词:信息提取 网页分类 WORDNET 基于词义分类 类别拓展 

分 类 号:TP319[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象