基于神经网络的多字体藏文印刷体字丁识别  被引量:2

Multi-Font Tibetan Printed Character Recognition Based on Neural Network

在线阅读下载全文

作  者:三知加 贡去卓么 才让加[1,2,3,4] 卓玛扎西 SAN Zhi-jia;GONG Qu-zhuo-me;CAI Rang-jia;ZHUO Ma-zha-xi(School of Computer Science,Qinghai Normal University,Xining Qinghai 810008,China;Tibetan Information Processing and Machine Translation Key Laboratory of Qinghai Province,Xining Qinghai 810008,China;Key Laboratory of Tibetan Information Processing,Ministry of Education,Xining Qinghai 810008,China;Tibetan Information Processing Engineering Technology and Research Center of Qinghai Province,Xining Qinghai 810008,China)

机构地区:[1]青海师范大学计算机学院,青海西宁810008 [2]青海省藏文信息处理与机器翻译重点实验室,青海西宁810008 [3]藏文信息处理教育部重点实验室,青海西宁810008 [4]青海省藏文信息处理工程技术研究中心,青海西宁810008

出  处:《计算机仿真》2022年第10期214-218,共5页Computer Simulation

摘  要:针对多字体藏文字丁数据集匮乏的现状和藏文印刷体多字体字丁的识别问题,构建了一个含有数据规模为48960张字丁图像的藏文印刷体字丁数据集(Tibetan Printed Character Dataset, TPCD),并对TPCD数据集进行了标记,归一化和二值化的预处理。运用各类包括支持向量机、前馈神经网络和卷积网络等线性统计和深度学习方法对数据集中的藏文字丁进行了识别实验。对实验结果进行评测后,提出的基于神经网络的模型可以使多字体藏文印刷体识别任务在测试集上的识别率、召回率和F1值分别达到了97%、96.6%和96.6%,证实了上述方法的有效性,为后续藏文文字识别提供了一定的理论和研究的基础。Aiming at the shortage of multi-font Tibetan character datasets and the issue of multi-font Tibetan character recognition, this paper constructs a Tibetan printed character dataset(TPCD) with 48960 character images, and preprocesses the dataset by labeling, normalization and binarization. Various linear statistical and deep learning methods including support vector machine, feedforward neural network and convolution neural network are applied to recognize Tibetan characters in the dataset. After evaluating the experimental results, the recognition rate, recall rate and F1 value of the multi-font Tibetan printed recognition task on the test set can reach 97%,96.6% and 96.6% respectively, which proves the effectiveness of the method and provides a theoretical and research foundation for the subsequent Tibetan character recognition.

关 键 词:藏文印刷体字丁数据集 卷积神经网络 多字体 藏文字丁 印刷体字丁识别 藏文构件 

分 类 号:TP391.9[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象