机构地区:[1]西北大学信息科学与技术学院,西安710127 [2]陕西省无源物联网国际联合研究中心,西安710127 [3]陕西理工大学数学与计算机科学学院,陕西汉中723001
出 处:《计算机学报》2020年第8期1572-1588,共17页Chinese Journal of Computers
基 金:国家自然科学基金(61672427,61972314);陕西省国际合作计划(2017KW-008);陕西省国际合作计划(2019KW-009);陕西省重点研发计划(2017 GY-191);陕西省创新团队(2018SD0011)资助。
摘 要:验证码被广泛应用于网站登录、注册等环节,用来增强身份验证和防止来自计算机程序的自动攻击.其中文本类验证码由于密码空间大、交互方式简单等特点被大多数主流网站使用.目前,为了增加计算机程序对文本类验证码自动识别的难度,设计时普遍将复杂干扰信息、字符扭曲、旋转和粘连、不同类型字体等安全性特征随机组合使用.由于组合了多种安全特征,传统的验证码识别方法对该种验证码的识别率非常低甚至失效.针对此类文本类验证码,本文提出了一种基于条件生成式对抗网络(CGAN)的通用识别方法.该方法利用CGAN去除验证码中的背景干扰信息并拉伸验证码中的字符间距,以生成无干扰且无字符粘连的验证码.然后使用本文优化组合的分割算法对验证码进行有效分割,再通过GoogleNet对分割后的单个字符进行识别.并且在难以以低成本大量获取真实验证码的情况下,本文设计了程序模拟验证码对网络进行训练,训练成本远低于现有其他方法且训练效果良好.最终的实验结果表明,本文提出的方法能够成功的识别Microsoft、Wikipedia、百度、支付宝、新浪等国际主流网站的验证码,识别率相较于传统方法最大提升度可达到70.2%.Captchas are widely used in the login and registration of websites to enhance authentication and prevent automatically attacks from computer programs.Captchas can be divided into three categories:text-based captcha,image-based captcha and audio-based captcha.Among the three captcha schemes,text-based Captchas are extensively used by most mainstream websites for its large password space and simple interaction mode.Due to the wide deployment of text-based captchas,a compromise on the captcha scheme can have significant implications and could result in serious consequences.At present,in order to protect text-based captcha against automatic recognition by computer programs,text-based Captchas generally use a random combination of different security features,such as complexity obstacle backgrounds,characters warp,rotate and overlapping.In spite of researchers have proposed a number of attacks,text-based Captchas are still being used by most popular websites such as Google,Microsoft,Alipay,etc.One of the reasons is that the previous model-based attacking methods are scheme-specific.This means that a small change in the captcha such as a noisier background,more overlapping characters can easily invalid a prior attack.The other reason is that prior deep learning-based attacks require millions of training samples.However,collecting and labelling so many captchas requires a labor-intensive and time-consuming process to construct.In order to address the above challenges,we present a generic generative adversarial network-based attack on text-based Captchas.Unlike previous machine-learning-based attacks,our approach does not require a large volume of real captchas to learn an effective solver,we significantly reduces the number of real captchas needed.This is achieved by first using CGAN to remove the background interference information and stretch the character spacing of the Captchas.Then we use the optimization of segmentation algorithms to segment the stretched Captchas effectively,and use GoogLeNet to perform single cha
关 键 词:文本类验证码 验证码识别 条件生成式对抗网络 字符分割 去干扰算法
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...