检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]华东师范大学信息科学技术学院,上海200241
出 处:《计算机应用》2008年第12期3248-3250,共3页journal of Computer Applications
摘 要:提出了一个基于概念向量空间模型的电子邮件分类方法。在提取电子邮件特征向量时,以WordNet语言本体库为基础,以同义词集合概念代替词条,同时考虑同义词集合间的上下位关系,从而建立电子邮件的概念向量空间模型作为电子邮件的特征向量。使用TF*IWF*IWF方法对概念向量进行权值修正,最后通过简单向量距离分类方法来确定电子邮件的类别。实验结果表明,当训练集合数目有限时,该方法能够有效提高电子邮件的分类准确率。A new approach of e-mail classification based on the concept vector space model was proposed. In this approach, the eigenvector of the e-mail was extracted during training process by replacing terms with synonymy sets in WordNet and considering hypernymy-hyponymy relation between synonymy sets. Then, TF * IWF * IWF method was used to revise the weight of the concept vector. In the end, the type of e-mail was determined using the simple vector classification method. Compared with the term-based VSM approach, the results show that this approach can improve the accuracy of e-mail classification especially when the size of training set is small.
关 键 词:电子邮件分类 WORDNET 概念向量 向量空间模型
分 类 号:TP393.098[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.87