检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:金鑫豪 池凯凯[1] JIN Xinhao;CHI Kaikai(School of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310013,China)
机构地区:[1]浙江工业大学计算机科学与技术学院,杭州310013
出 处:《计算机科学》2024年第S02期289-297,共9页Computer Science
基 金:国家自然科学基金面上项目(62272414)。
摘 要:数字化浪潮下,企业日益依赖机器人流程自动化(Robot Process Automation,RPA)技术来降低成本、提高效率,以保持竞争力。但流程中部分环节面临汉字点选验证码识别的难题,限制了自动化水平的进一步提高。现有研究方案存在数据集制作难度大、模型泛化性能差、模型复杂度与性能之间不平衡等问题。为此,提出一种数据集制作成本低、模型泛化性能好且轻量化的汉字点选验证码识别方法。具体而言:首先采用经过针对性改进的YOLOv8-n显著轻量化汉字检测模型,然后对汉字图片进行分割、矫正等预处理操作,接着采用泛化性强的PaddleOCR模型进行汉字识别,降低了场景迁移的成本,并通过识别概率矩阵得到最佳匹配结果,进一步提高了准确率。此外,设计了一种半自动的汉字检测数据集构建流程并公开了数据集。该研究旨在推动汉字点选验证码的自动识别技术的发展,促进企业流程自动化水平的提升。With the advent of digitalization,enterprises increasingly rely on robotic process automation technologies to reduce costs and improve efficiency,thus maintaining competitiveness.However,the automation level is hindered by the challenge of Chinese character click-based CAPTCHA recognition in certain process steps.Existing research on this problem faces difficulties in dataset creation,poor model generalization performance,and an imbalance between model complexity and performance.To address these issues,this paper proposes a low-cost dataset creation approach and a lightweight Chinese character click-based CAPTCHA recognition method with excellent generalization performance.Specifically,a significantly lightweight version of the YOLOv8-n model,tailored for Chinese cha-racter detection,is employed in this study.Subsequently,preprocessing operations such as segmentation and rectification are applied to the CAPTCHA images.The highly versatile PaddleOCR model is utilized for Chinese character recognition,reducing the cost of scene adaptation.Furthermore,the best matching result is obtained through the recognition probability matrix,further enhancing accuracy.Additionally,a semi-automatic Chinese character detection dataset construction process is designed and made publicly available.This research aims to promote the development of automated Chinese character click-based CAPTCHA recognition techniques,enhance the level of enterprise process automation.
关 键 词:流程自动化 验证码识别 YOLOv8 PaddleOCR 轻量化
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7