基于贪吃蛇算法和部首识别的手写文本切分  

Handwritten Text Segmentation Method Based on Greedy Snake Algorithm and Radical Recognition

在线阅读下载全文

作  者:付鹏斌 董澳静 杨惠荣 FU Pengbin;DONG Aojing;YANG Huirong(Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China)

机构地区:[1]北京工业大学信息学部,北京100124

出  处:《华南理工大学学报(自然科学版)》2022年第1期80-90,共11页Journal of South China University of Technology(Natural Science Edition)

基  金:国家自然科学基金资助项目(61772048);北京市自然科学基金资助项目(4153058);北京市教委教学改革创新项目(040000514120521)。

摘  要:针对手写中文文本交错、粘连、字内过分离等问题,提出一种基于贪吃蛇算法和部首识别的文本切分方法。首先,根据贪吃蛇算法建立文本原始切分轨迹,并依据多重规则优化切分路径;之后,基于粘连字符的轮廓和骨架提取候选粘连点,利用贪吃蛇算法进行二次切分;最后,对过切分字符,进行部首的笔段提取和识别,依据汉字结构确定合并方向,并结合几何置信度和识别置信度完成合并,得到最终正确的文本切分结果。以陕西省某高中试卷中1542行手写文本作为实验数据进行了算法验证,结果表明,该算法切分正确率可达到82.15%。A segmentation method based on greedy snake algorithm and radical recognition was proposed to solve the problems of interlacing,adhesion and over-segmentation of Chinese handwritten text.Firstly,the original text segmentation trajectory was established based on the greedy snake algorithm,and the segmentation path was optimized according to the multiple rules.Then,candidate adhesion points were extracted based on the outline and skeleton of adhesion characters,and the gluttonous snake algorithm was used for secondary segmentation.Finally,the radical extraction and recognition of the over-segmentation characters was carried out,and the merging direction was determined based on the structure of Chinese characters.Combined with geometric confidence and recognition confidence,the merging of the over-segmentation characters was completed,and the correct text segmentation result was finally obtained.The effectiveness of the algorithm was verified by the experiment on 1542 lines of handwritten text from a high school test papers of Shaanxi province.The result shows that the accuracy of the segmentation algorithm can reach 82.15%.

关 键 词:手写体中文文本 粘连字符 贪吃蛇 过切分合并 部首识别 笔段提取 

分 类 号:TP391.43[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象