结合TFIDF的Self-Attention-Based Bi-LSTM的垃圾短信识别  被引量:11

Spam Message Recognition Based on TFIDF and Self-Attention-Based Bi-LSTM

在线阅读下载全文

作  者:吴思慧 陈世平[2] WU Si-Hui;CHEN Shi-Ping(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China;Shanghai Key Laboratory of Data Science,Fudan University,Shanghai 201203,China)

机构地区:[1]上海理工大学光电信息与计算机工程学院,上海200093 [2]复旦大学上海市数据科学重点实验室,上海201203

出  处:《计算机系统应用》2020年第9期171-177,共7页Computer Systems & Applications

基  金:国家自然科学基金(61472256,61170277,61003031);上海重点科技攻关项目(14511107902);上海市工程中心建设项目(GCZXL14014);上海市一流学科建设项目(S1201YLXK,XTKX2021.);上海市数据科学重点实验室开发课题(201609060003);沪江基金(A14006);沪江基金研究基地专项(C14001)。

摘  要:随着手机短信成为人们日常生活交往的重要手段,垃圾短信的识别具有重要的现实意义.针对此提出一种结合TFIDF的self-attention-based Bi-LSTM的神经网络模型.该模型首先将短信文本以词向量的方式输入到Bi-LSTM层,经过特征提取并结合TFIDF和self-attention层的信息聚焦获得最后的特征向量,最后将特征向量通过Softmax分类器进行分类得到短信文本分类结果.实验结果表明,结合TFIDF的self-attention-based Bi-LSTM模型相比于传统分类模型的短信文本识别准确率提高了2.1%–4.6%,运行时间减少了0.6 s–10.2 s.Mobile phone text messaging has become an increasingly important means of daily communication,so the identification of spam messages has importantly practical significance.A self-attention-based Bi-LSTM neural network model combined with TFIDF is proposed for this purpose.The model first inputs the short message to the Bi-LSTM layer in a vector manner,after feature extraction and combining the information of TFIDF and self-attention layers,the final feature vector is obtained.Finally,the feature vector is classified by the Softmax classifier to obtain the classification result.The experimental results show,compared with the traditional classification model,the self-attention-based Bi-LSTM model combined with TFIDF improves the accuracy of text recognition by 2.1%–4.6%,and the running time is reduced by 0.6 s–10.2 s.

关 键 词:垃圾短信 文本分类 self-attention Bi-LSTM TFIDF 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术] TP181[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象