基于字符编码与卷积神经网络的汉字识别  被引量:6

Chinese character recognition based on convolutionalneural network and character encoding

在线阅读下载全文

作  者:刘正琼[1] 丁力 凌琳[2] 李学飞 周文霞 Liu Zhengqiong;Ding Li;Ling lin;Li Xuefei;Zhou Wenxia(School of Computer Science and Information Engineering,Hefei University of Technology,Hefei 230601,China;School of Mechanical Engineering,Hefei University of Technology,Hefei 230009,China)

机构地区:[1]合肥工业大学计算机与信息学院,合肥230601 [2]合肥工业大学机械工程学院,合肥230009

出  处:《电子测量与仪器学报》2020年第2期143-149,共7页Journal of Electronic Measurement and Instrumentation

基  金:安徽省科技攻关计划(1604a0902182)资助项目。

摘  要:汉字识别是人工智能与模式识别领域中重要的研究内容,针对现有的研究仍然存在着参数调整难度大、训练样本数少、不能识别所有常用字符等问题,提出了一种基于字符编码与卷积神经网络的汉字识别方法,首先通过查询字库得到所有字符信息,以utf-8编码方式与多种字体编码文件进行编码输出字符图像,再进行多种图像处理后得到数据集,并利用深度卷积神经网络进行训练识别,在网络训练中通过数据扩增、批标准化、RMSProp优化等方式进行优化,同时加入正则化和Dropout防止过拟合。实验结果表明,所提方法对于汉字的识别率达到了98.08%,与Alexnet、LeNet-5相比,使用同一数据集在识别准确率上提高了9.37%、21.14%,实现了一个识别率高、特征提取能力与泛化能力强的神经网络。Chinese character recognition is an important research content in the field of artificial intelligence and pattern recognition. Existing research still has problems such as difficulty in parameter adjustment, small number of training samples, and inability to identify all common characters. Aiming at these problems, we propose a Chinese character recognition method based on character encoding and convolutional neural network. First, we obtain all the character information by querying the font database, which are encoded and outputted by using UTF-8 encoding method and various font encoding files to generate character images. Further, we apply various of image processing to obtain the new character image dataset. Then, we propose a deep convolutional neural network for Chinese character recognition. In the training procedure, data augmentation, batch normalization, RMSProp optimization are optimized, regularization and dropout are used to prevent over-fitting for optimization. The experimental results show that the proposed method is simple yet effective, the recognition accuracy rate for Chinese characters is 98.08%. Compared with Alexnet and LeNet-5, we obtain a significant improvement by 9.37% and 21.14%. A neural network with high recognition rate, strong feature extraction ability and generalization ability is realized.

关 键 词:汉字识别 卷积神经网络 字符编码 过拟合 批标准化 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术] TP18[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象