一种基于条件生成式对抗网络的文本类验证码识别方法被引量：9

A Recognition Method for Text-Based Captcha Based on CGAN

作　　者：汤战勇[1,2] 田超雄叶贵鑫李婧王薇龚晓庆[1,2] 陈晓江房鼎益[1,2] TANG Zhan-Yong;TIAN Chao-Xiong;YE Gui-Xin;LI Jing;WANG Wei;GONG Xiao-Qing;CHEN Xiao-Jiang;FAND Ding-Yi(School of Information Science and Technology,Northwest University,Xi’an 710127;Shaanxi International Joint Research Centre for the Battery-free Internet of Things,Xi’an 710127;School of Mathematics and Computer Science,Shaanxi University of Technology,Hanzhong,Shaanxi 723001)

机构地区：[1]西北大学信息科学与技术学院,西安710127 [2]陕西省无源物联网国际联合研究中心,西安710127 [3]陕西理工大学数学与计算机科学学院,陕西汉中723001

出　　处：《计算机学报》2020年第8期1572-1588,共17页Chinese Journal of Computers

基　　金：国家自然科学基金(61672427,61972314);陕西省国际合作计划(2017KW-008);陕西省国际合作计划(2019KW-009);陕西省重点研发计划(2017 GY-191);陕西省创新团队(2018SD0011)资助。

摘　　要：验证码被广泛应用于网站登录、注册等环节,用来增强身份验证和防止来自计算机程序的自动攻击.其中文本类验证码由于密码空间大、交互方式简单等特点被大多数主流网站使用.目前,为了增加计算机程序对文本类验证码自动识别的难度,设计时普遍将复杂干扰信息、字符扭曲、旋转和粘连、不同类型字体等安全性特征随机组合使用.由于组合了多种安全特征,传统的验证码识别方法对该种验证码的识别率非常低甚至失效.针对此类文本类验证码,本文提出了一种基于条件生成式对抗网络(CGAN)的通用识别方法.该方法利用CGAN去除验证码中的背景干扰信息并拉伸验证码中的字符间距,以生成无干扰且无字符粘连的验证码.然后使用本文优化组合的分割算法对验证码进行有效分割,再通过GoogleNet对分割后的单个字符进行识别.并且在难以以低成本大量获取真实验证码的情况下,本文设计了程序模拟验证码对网络进行训练,训练成本远低于现有其他方法且训练效果良好.最终的实验结果表明,本文提出的方法能够成功的识别Microsoft、Wikipedia、百度、支付宝、新浪等国际主流网站的验证码,识别率相较于传统方法最大提升度可达到70.2%.Captchas are widely used in the login and registration of websites to enhance authentication and prevent automatically attacks from computer programs.Captchas can be divided into three categories:text-based captcha,image-based captcha and audio-based captcha.Among the three captcha schemes,text-based Captchas are extensively used by most mainstream websites for its large password space and simple interaction mode.Due to the wide deployment of text-based captchas,a compromise on the captcha scheme can have significant implications and could result in serious consequences.At present,in order to protect text-based captcha against automatic recognition by computer programs,text-based Captchas generally use a random combination of different security features,such as complexity obstacle backgrounds,characters warp,rotate and overlapping.In spite of researchers have proposed a number of attacks,text-based Captchas are still being used by most popular websites such as Google,Microsoft,Alipay,etc.One of the reasons is that the previous model-based attacking methods are scheme-specific.This means that a small change in the captcha such as a noisier background,more overlapping characters can easily invalid a prior attack.The other reason is that prior deep learning-based attacks require millions of training samples.However,collecting and labelling so many captchas requires a labor-intensive and time-consuming process to construct.In order to address the above challenges,we present a generic generative adversarial network-based attack on text-based Captchas.Unlike previous machine-learning-based attacks,our approach does not require a large volume of real captchas to learn an effective solver,we significantly reduces the number of real captchas needed.This is achieved by first using CGAN to remove the background interference information and stretch the character spacing of the Captchas.Then we use the optimization of segmentation algorithms to segment the stretched Captchas effectively,and use GoogLeNet to perform single cha

关键词：文本类验证码验证码识别条件生成式对抗网络字符分割去干扰算法

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于条件生成式对抗网络的文本类验证码识别方法被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于条件生成式对抗网络的文本类验证码识别方法 被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种基于条件生成式对抗网络的文本类验证码识别方法被引量：9