基于概念向量空间模型的电子邮件分类  

E-mail classification based on concept vector space model

在线阅读下载全文

作  者:曾超[1] 吕钊[1] 顾君忠[1] 

机构地区:[1]华东师范大学信息科学技术学院,上海200241

出  处:《计算机应用》2008年第12期3248-3250,共3页journal of Computer Applications

摘  要:提出了一个基于概念向量空间模型的电子邮件分类方法。在提取电子邮件特征向量时,以WordNet语言本体库为基础,以同义词集合概念代替词条,同时考虑同义词集合间的上下位关系,从而建立电子邮件的概念向量空间模型作为电子邮件的特征向量。使用TF*IWF*IWF方法对概念向量进行权值修正,最后通过简单向量距离分类方法来确定电子邮件的类别。实验结果表明,当训练集合数目有限时,该方法能够有效提高电子邮件的分类准确率。A new approach of e-mail classification based on the concept vector space model was proposed. In this approach, the eigenvector of the e-mail was extracted during training process by replacing terms with synonymy sets in WordNet and considering hypernymy-hyponymy relation between synonymy sets. Then, TF * IWF * IWF method was used to revise the weight of the concept vector. In the end, the type of e-mail was determined using the simple vector classification method. Compared with the term-based VSM approach, the results show that this approach can improve the accuracy of e-mail classification especially when the size of training set is small.

关 键 词:电子邮件分类 WORDNET 概念向量 向量空间模型 

分 类 号:TP393.098[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象