结合TFIDF的Self-Attention-Based Bi-LSTM的垃圾短信识别被引量：11

Spam Message Recognition Based on TFIDF and Self-Attention-Based Bi-LSTM

作　　者：吴思慧陈世平[2] WU Si-Hui;CHEN Shi-Ping(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China;Shanghai Key Laboratory of Data Science,Fudan University,Shanghai 201203,China)

机构地区：[1]上海理工大学光电信息与计算机工程学院,上海200093 [2]复旦大学上海市数据科学重点实验室,上海201203

出　　处：《计算机系统应用》2020年第9期171-177,共7页Computer Systems & Applications

基　　金：国家自然科学基金(61472256,61170277,61003031);上海重点科技攻关项目(14511107902);上海市工程中心建设项目(GCZXL14014);上海市一流学科建设项目(S1201YLXK,XTKX2021.);上海市数据科学重点实验室开发课题(201609060003);沪江基金(A14006);沪江基金研究基地专项(C14001)。

摘　　要：随着手机短信成为人们日常生活交往的重要手段,垃圾短信的识别具有重要的现实意义.针对此提出一种结合TFIDF的self-attention-based Bi-LSTM的神经网络模型.该模型首先将短信文本以词向量的方式输入到Bi-LSTM层,经过特征提取并结合TFIDF和self-attention层的信息聚焦获得最后的特征向量,最后将特征向量通过Softmax分类器进行分类得到短信文本分类结果.实验结果表明,结合TFIDF的self-attention-based Bi-LSTM模型相比于传统分类模型的短信文本识别准确率提高了2.1%–4.6%,运行时间减少了0.6 s–10.2 s.Mobile phone text messaging has become an increasingly important means of daily communication,so the identification of spam messages has importantly practical significance.A self-attention-based Bi-LSTM neural network model combined with TFIDF is proposed for this purpose.The model first inputs the short message to the Bi-LSTM layer in a vector manner,after feature extraction and combining the information of TFIDF and self-attention layers,the final feature vector is obtained.Finally,the feature vector is classified by the Softmax classifier to obtain the classification result.The experimental results show,compared with the traditional classification model,the self-attention-based Bi-LSTM model combined with TFIDF improves the accuracy of text recognition by 2.1%–4.6%,and the running time is reduced by 0.6 s–10.2 s.

关键词：垃圾短信文本分类 self-attention Bi-LSTM TFIDF

分类号：TP391.1[自动化与计算机技术—计算机应用技术] TP181[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

结合TFIDF的Self-Attention-Based Bi-LSTM的垃圾短信识别被引量：11

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

结合TFIDF的Self-Attention-Based Bi-LSTM的垃圾短信识别 被引量：11

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

结合TFIDF的Self-Attention-Based Bi-LSTM的垃圾短信识别被引量：11