检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:盖璇 GAI Xuan(School of Computer and Information Technology,Northeast Petroleum University,Daqing 163318,China)
机构地区:[1]东北石油大学计算机与信息技术学院,黑龙江大庆163318
出 处:《计算机与现代化》2020年第10期17-22,共6页Computer and Modernization
基 金:东北石油大学引导性创新基金资助项目(ky121728)。
摘 要:以往使用的垃圾邮件识别方法在面对如今更新速度快且种类繁多的分词时,很难精准地识别出邮件中的关键分词,识别方法的应用能力需要进一步提高。为此,提出一种基于聚类分析算法的垃圾邮件识别方法。首先,预处理邮件样本,得到邮件文本内容的关键分词,剔除停用词,根据分词在邮件文本中出现的频率计算出分词的权重;然后,结合邮件特征属性,构建邮件特征空间,将邮件特征量化;最后,提取出邮件特征并降维处理,将其作为聚类算法的输入,经过迭代计算输出结果从而完成垃圾邮件的识别。实验结果表明:设计的基于聚类分析算法的垃圾邮件识别方法在关键词提取与分词方面更加精确,并且能够准确地识别出垃圾邮件,说明设计的基于聚类分析算法的垃圾邮件识别方法的实际应用能力得到了提高。For spam recognition methods used in the past,in the face of today’s fast updating and a wide variety of word segmentation,it is difficult to accurately identify the key word segmentation in a e-mail,the application ability of the recognition methods needs to be further improved.To this end,a spam recognition method based on cluster analysis algorithm is proposed.Firstly,we preprocess e-mail samples to get the key word segmentation of the e-mail text content,remove the stop words,and calculate the weight of the word segmentation according to the frequency of the word segmentation in the e-mail text.Then,combining with the e-mail feature attributes,we construct the e-mail feature space,and quantify the e-mail feature.Lastly,the features of the e-mail are extracted and processed for dimensionality reduction,which is used as the input of the clustering algorithm,and the output result is iteratively calculated to complete the identification of spam.The experimental results show that the designed spam e-mail recognition method based on cluster analysis algorithm is more accurate in keyword extraction and word segmentation,and can accurately identify spam e-mails,which shows the practical application ability of the designed spam e-mail recognition method based on cluster analysis algorithm has been improved.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.118.126.145