基于汉字拆分嵌入和二部图的残损碑文识别  

Damaged Inscription Recognition Based on Hierarchical Decomposition Embedding and Bipartite Graph

在线阅读下载全文

作  者:蔺广逢 吴娜 贺梦兰 张二虎 孙强[2] LIN Guangfeng;WU Na;HE Menglan;ZHANG Erhu;SUN Qiang(Faculty of Printing,Packaging and Digital Media,Xi’an University of Technology,Xi’an 710048,China;Faculty of Automation and Information Engineering,Xi’an University of Technology,Xi’an 710048,China)

机构地区:[1]西安理工大学印刷包装与数字媒体学院,西安710048 [2]西安理工大学自动化与信息工程学院,西安710048

出  处:《电子与信息学报》2024年第2期564-573,共10页Journal of Electronics & Information Technology

基  金:国家自然科学基金(61771386);陕西省重点研发计划(2020SF-359);陕西省自然科学基础研究计划(2021JM-340)。

摘  要:古籍碑刻承载着丰富的历史文化信息,但是由于自然风化浸蚀和人为破坏使得碑石上的文字信息残缺不全。古碑文语义信息多样化且样例不足,使得学习行文语义补全识别残损文字变得十分困难。该文试图从字形空间语义建模解决补全残损汉字进行识别理解这一挑战性任务。该文在层级拆分嵌入(HDE)编码方法的基础上使用动态图修补嵌入(DynamicGrape),对待识别汉字的图像进行特征映射并判别是否残损。如未残损直接转化为层级拆分编码,输入二部图推理字节点到部件节点的边权重,比对字库编码识别理解;如残损需要在字库里检索可能字和部件,对汉字编码的特征维度进行选择,输入二部图推理预测可能的汉字结果。在自建的数据集以及中文自然文本(CTW)数据集中进行验证,结果表明二部图网络可以有效迁移和推理出残损文字字形信息,该文方法可以有效对残损汉字进行识别理解,为残损结构信息处理开拓出了新的思路和途径。Ancient inscriptions carry rich historical and cultural information.However,due to natural weathering and man-made destruction,the text information on the inscriptions is incomplete.The semantic information of ancient inscriptions is diverse and the text examples of ancient inscription are insufficient,which make it very difficult to learn the semantic information between Chinese characters for recognizing damaged characters.The challenging task of damaged characters recognition and understanding by Chinese character spatial semantic modeling is attempted to be solved in this paper.Based on Hierarchical Decomposition Embedding(HDE),the proposed DynamicGrape performs feature mapping on damaged character image and determines whether it is damaged.If character is not damaged,its image is directly converted into hierarchical decomposition embedding to reason the edge weight of the bipartite graph for recognizing Chinese character.If character is damaged,it is necessary to search for possible Chinese characters and components in the encoding set,select the feature dimension of HDE from image mapping,and input the bipartite graph to infer the possible Chinese character.In the self-built dataset and Chinese Text in the Wild(CTW)dataset,the experimental results show that the bipartite graph network can not only transfer and infer Chinese character pattern of damaged characters effectively,but also precisely recognize and understand damaged Chinese characters.It opens up new ideas for the damaged structure information processing.

关 键 词:残损碑文 碑文预测 碑文识别 残损文字识别 二部图神经网络 

分 类 号:TN911.73[电子电信—通信与信息系统] TP18[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象