检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:赵伟[1] 邓叶勋 赵建强 李文瑞[1] 韩冰[1] 欧荣安[1] ZHAO Wei;DENG Ye-xun;ZHAO Jian-qiang;LI Wen-rui;HAN Bing;OU Rong-an(Guangzhou Institute of Criminal Science and Technology,Guangzhou 510030,China;Xiamen Meiya Pico Information Co.,Ltd.,Xiamen 361008,China;Xidian University,Xi’an 710071,China)
机构地区:[1]广州市刑事科学技术研究所,广东广州510030 [2]厦门市美亚柏科信息股份有限公司,福建厦门361008 [3]西安电子科技大学,陕西西安710071
出 处:《计算机技术与发展》2021年第3期65-69,110,共6页Computer Technology and Development
基 金:广州科技攻关重大专项(201903007)。
摘 要:互联网是广告推广的重要媒介,但是低质、诈骗、违法等违规广告也大量充斥其中,严重污染网络空间,因此,实现恶意广告的有效甄别对构建安全清朗的网络环境意义重大。针对各类违法违规中文广告内容的识别需求,利用Bert(bidirectional encoder representation from transformers)和Word2vec分别提取文本字粒度和词粒度嵌入特征,使用CNN(convolutional neural networks)网络对Bert高层特征做深层抽取,同时将词粒度特征向量输入到双向LSTM(long short-term memory)网络提取全局语义,并采用Attention机制对语义特征强化,将强化特征和Bert字粒度特征进行融合,充分利用动态词向量和静态词向量的语义表征优势,提出一种基于强化语义的中文广告识别模型CARES(Chinese advertisement text recognition based on enhanced semantic)。在真实的社交聊天文本数据集上的实验表明,与使用卷积神经网络、循环神经网络等文本分类模型相比,CARES模型分类性能最优,能更加精确识别社交聊天文本中的广告内容,模型识别的正确率达到97.73%。The Internet is an important medium for advertising promotion.Low-quality,fraud,illegal advertisements are full of the Internet,which pollute cyberspace seriously.Therefore,the realization of effective screening of malicious advertising is of great significance to construct a safe and clean network environment.We use Bert(bidirectional encoder representation from transformers)and Word2vec to extract char and word level embedding features respectively,and use CNN(revolutionary neural networks)to extract the high-level features of Bert,input the word features vector into the long short term memory(LSTM)network to extract the global semantics,and use the attention mechanism to strengthen the semantic features,integrate the enhanced features and Bert word features,which make full use of the semantic representation advantages of dynamic and static word vectors.We propose a Chinese advertising recognition model CARES(Chinese advertisement text recognition based on enhanced semantic).Compared with other text classification models such as convolutional neural network and recurrent neural network,CARES has the best classification performance and can recognize the advertising content in social chat text more accurately,the accuracy of advertising text recognition reaches 97.73%.
关 键 词:广告文本分类 语义强化 特征融合 预训练 注意力机制
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.138.34.80