基于中朝统一IDS编码的朝鲜语古籍文字识别方法  

Korean ancient books character recognition method based on unified Chinese and Korean characters ideographic description sequences coding

在线阅读下载全文

作  者:赵梦玲 金小峰[1] ZHAO Mengling;JIN Xiaofeng(College of Integration Science,Yanbian University,Yanji 133002,China)

机构地区:[1]延边大学融合学院,吉林延吉133002

出  处:《延边大学学报(自然科学版)》2024年第2期101-106,共6页Journal of Yanbian University(Natural Science Edition)

基  金:吉林省教育厅人文社科基础研究项目(JJKH20230608SK)。

摘  要:为解决朝鲜语古籍中的中文和朝鲜文字混排的识别难题,提出一种中朝文字的表意文字描述序列(IDS)统一编码方案,旨在通过利用偏旁分解字符识别模型(CCR-CLIP)识别朝鲜语古籍文字.首先,根据中朝文字结构的相似性,对文字中出现的汉字偏旁、朝鲜文字字母和12种基本结构进行了统一编码;其次,通过加入朝鲜文字的IDS序列扩充了CCR-CLIP原模型中提供的汉字的IDS序列文件;最后,通过在训练阶段使用印刷体文字训练的方式解决了朝鲜语古籍样本少的问题.In order to solve the problem of recognition of mixed Chinese and Korean characters in ancient Korean books,this paper proposes a unified ideographic description sequence(IDS)encoding scheme for Chinese and Korean characters,which aims to recognize ancient Korean books by using a side decomposition chinese character recognition-contrastive language–image pre-training(CCR-CLIP).Firstly,according to the similarity of Chinese and Korean characters,the Chinese characters’side edges,Korean characters’letters and 12 kinds of basic structures are uniformly coded.Secondly,the IDS sequence file of Chinese characters provided in the original model of CCR-CLIP is extended by adding IDS sequence of Korean characters.Finally,the problem of few samples of Korean ancient books was solved by using printed characters in the training stage.The results show that compared with the CCR-SLD method,the character recognition accuracy of this method is improved by 13.8%in the experiment of Korean ancient books.In the printed text experiment,the accuracy of character recognition improved by 5.38%.The established method is better than other methods in solving the problem of Korean ancient text recognition,and can provide reference for solving the problem of Korean ancient text recognition.

关 键 词:朝鲜语古籍 零样本 文字识别 文字编码 表意文字描述序列 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象