面向汉字点选验证码的轻量级高效识别方法  

Lightweight and Efficient Recognition Method for Chinese Character Click-based CAPTCHA

在线阅读下载全文

作  者:金鑫豪 池凯凯[1] JIN Xinhao;CHI Kaikai(School of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310013,China)

机构地区:[1]浙江工业大学计算机科学与技术学院,杭州310013

出  处:《计算机科学》2024年第S02期289-297,共9页Computer Science

基  金:国家自然科学基金面上项目(62272414)。

摘  要:数字化浪潮下,企业日益依赖机器人流程自动化(Robot Process Automation,RPA)技术来降低成本、提高效率,以保持竞争力。但流程中部分环节面临汉字点选验证码识别的难题,限制了自动化水平的进一步提高。现有研究方案存在数据集制作难度大、模型泛化性能差、模型复杂度与性能之间不平衡等问题。为此,提出一种数据集制作成本低、模型泛化性能好且轻量化的汉字点选验证码识别方法。具体而言:首先采用经过针对性改进的YOLOv8-n显著轻量化汉字检测模型,然后对汉字图片进行分割、矫正等预处理操作,接着采用泛化性强的PaddleOCR模型进行汉字识别,降低了场景迁移的成本,并通过识别概率矩阵得到最佳匹配结果,进一步提高了准确率。此外,设计了一种半自动的汉字检测数据集构建流程并公开了数据集。该研究旨在推动汉字点选验证码的自动识别技术的发展,促进企业流程自动化水平的提升。With the advent of digitalization,enterprises increasingly rely on robotic process automation technologies to reduce costs and improve efficiency,thus maintaining competitiveness.However,the automation level is hindered by the challenge of Chinese character click-based CAPTCHA recognition in certain process steps.Existing research on this problem faces difficulties in dataset creation,poor model generalization performance,and an imbalance between model complexity and performance.To address these issues,this paper proposes a low-cost dataset creation approach and a lightweight Chinese character click-based CAPTCHA recognition method with excellent generalization performance.Specifically,a significantly lightweight version of the YOLOv8-n model,tailored for Chinese cha-racter detection,is employed in this study.Subsequently,preprocessing operations such as segmentation and rectification are applied to the CAPTCHA images.The highly versatile PaddleOCR model is utilized for Chinese character recognition,reducing the cost of scene adaptation.Furthermore,the best matching result is obtained through the recognition probability matrix,further enhancing accuracy.Additionally,a semi-automatic Chinese character detection dataset construction process is designed and made publicly available.This research aims to promote the development of automated Chinese character click-based CAPTCHA recognition techniques,enhance the level of enterprise process automation.

关 键 词:流程自动化 验证码识别 YOLOv8 PaddleOCR 轻量化 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象