古文献手写汉字切分方法研究  被引量:4

Study on the Segmentation Method of Handwritten Characters From Historical Chinese Documents

在线阅读下载全文

作  者:张忠林[1] 吴相锦 周生龙[2] 

机构地区:[1]兰州交通大学电子与信息工程学院,甘肃兰州730070 [2]甘肃省图书馆,甘肃兰州730000

出  处:《郑州大学学报(工学版)》2015年第6期70-75,共6页Journal of Zhengzhou University(Engineering Science)

基  金:教育部人文社会科学研究规划基金资助项目(14YJA870014)

摘  要:根据古文献和古汉字的多重叠、多粘连等特点,提出了适合古文献的列切分和字切分方法.列切分采用统计投影循环过滤方法,首先对古文献进行纵向上的统计投影,然后采用循环过滤的方法对统计结果进行处理直到分离出比较均匀的列.该算法在噪点较多、有一定倾斜、列高度不均匀等多种复杂情况下,取得了很好的效果.字切分采用投影、分段投影和顶底部笔画特征相结合的多步切分方法,并在此基础上采用上下文相结合的方法进行切分检验,对不正确的切分进行调整.分段投影采用二分的思想把存在粘连、重叠的字段分为左右两部分,分别进行投影,并分析投影数组获取字段的切分路径;顶底部笔画特征切分法是根据汉字顶底部笔画的特点找到过度切分和不足切分,依次对切分进行调整.实验结果表明,提出的方法能较好地用于手写古文献的切分.In this paper,we propose methods of text line and character segmentation,which suit the characteristics of ancient documents and handwritten characters of China,such as longitudinal writing,overlapping,conglutination and so on. For line segmentation,a method called statistical projection filtering is proposed.Firstly,we count up the vertical projection of ancient documents,then adopt the method of loop filter to deal with statistical results until much uniform columns are isolated. Even in some complex cases,like much noise,certain inclined and column height is not uniform,our algorithm still has good performance. The methods of projection,piecewise projection and segmentation of strokes features at top and bottom are applied to character segmentation. Finally,the context combined method are adopted to test the segmentation,then,the mistaken segmentation is adjusted. Using the idea of dichotomy,piecewise projection divide characters,where exist overlap and adhesion exist,into two parts,then projected respectively. After that,analyzing projection arrays,we get segmentation path. After finding the over- segmentation and under- segmentation by SM- SFTB( the segmentation method of strokes features at top and bottom) using the characteristics of Chinese character strokes,the adjustment for segmentation is possible. The experimental results show that the proposed methods have good performance for historical Chinese documents.

关 键 词:古文献 手写汉字 汉字切分 分割算法 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象