检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张策 王维兰[1] Zhang Ce;Wang Weilan(Key Laboratory of China's Ethnic Languages and Information Technology of Ministry of Education,Northwest Minzu University,Lanzhou,Gansu 730030,China;School of Mathematics and Information Engineering,Chongqing University of Education,Chongqing 400065,China)
机构地区:[1]西北民族大学中国民族语言文字信息技术教育部重点实验室,甘肃兰州730030 [2]重庆第二师范学院数学与信息工程学院,重庆400065
出 处:《激光与光电子学进展》2021年第20期252-267,共16页Laser & Optoelectronics Progress
基 金:国家自然科学基金(61772430);国家民委创新团队计划((2018)98号);优秀研究生“创新之星”项目(2021CXZX-663);重庆市教育委员会科学技术研究计划项目(KJQN202101608);重庆第二师范学院校级科研项目(KY202118C)。
摘 要:字符切分是藏文古籍文档图像分析与识别中重要的一环,针对乌金体藏文古籍文本行倾斜,字符之间笔画交叠、交叉、粘连以及不同程度的笔画断裂、噪声干扰等问题,提出了一种基于结构属性的乌金体藏文字符切分方法。首先,建立了乌金体藏文古籍字符区块库。然后,利用音节点位置信息或结合水平投影与直线检测的方法检测出字符区块的局部基线,并根据基线将字符区块切分为上下两部分;利用改进的模板匹配算法检测基线上方笔画的粘连及其类型,利用多方向、多路径粘连切分算法切分交叉、粘连笔画。最后,根据藏文结构属性对各笔画进行归属,完成字符切分。实验结果表明,本方法能有效解决字符切分中遇到的问题,字符切分的召回率、精确率以及F-Measure可分别达到96.52%、98.24%、97.37%。Character segmentation is an important part in image analysis and recognition of historical Tibetan document.Aiming at the problems of text line slanting,stroke overlapping,crossing,touching between characters,stroke breaking and noise interference of historical Uchen Tibetan document,a character segmentation method for historical Uchen Tibetan document based on structure attributes is proposed in this paper.First,a character block dataset of historical Uchen Tibetan document is established.Then,the local baseline of character block is detected by using syllable point position information or combining horizontal projection and linear detection,and the character block is divided horizontally into two parts above and below the baseline.The improved template matching algorithm is used to detect touching strokes and touching type above the baseline.The multi-direction and multi-path touching character segmentation algorithm is used to realize crossing and touching strokes segmentation.Finally,according to Tibetan structure attribute,to complete the attribution of each stroke.Experimental results show that the proposed method can effectively solve the challenge problem in character segmentation.The recall rate,precision rate and FMeasure of character segmentation reached 96.52%,98.24% and 97.37%,respectively.
关 键 词:图像处理 藏文古籍文档 字符区块 局部基线 粘连检测与切分 笔画归属
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.129.67.167