印刷维吾尔文本切割被引量：17

Printed Uyghur Texts Segmentation

机构地区：[1]清华大学智能技术与系统国家重点实验室电子工程系,北京100084

出　　处：《中文信息学报》2005年第5期76-83,共8页Journal of Chinese Information Processing

基　　金：国家自然科学基金资助项目(60241005)

摘　　要：我国新疆地区使用的维吾尔文借用阿拉伯文字母书写。因为阿拉伯文字母自身书写的特点,造成维文文本的切割和识别极其困难。本文在连通体分类的基础上,结合水平投影和连通体分析的方法实现维文文本的文字行切分和单词切分。然后定位单词基线位置,计算单词轮廓和基线的距离,寻找所有可能的切点实现维文单词过切割,最后利用规则合并过切分字符。实验结果表明,字符切割准确率达到99%以上。Uyghur is spoken in Xinjiang Uyghur Autonomous Region of China, which adopts Arabic script to write. As a cursive script and other characteristics, it is very difficult to do text segmentation and recognition. In this paper, a method, which hybrid horizontal projection and connected components analysis, based on connected components classification is proposed to do text line segmentation and word segmentation of Uyghur texts. And then, the baseline position of each word is estimated. All candidate character segmentation points are fotmd out by calculating the distance between word contour and baseline. Finally, over-segmented characters are merged according to rules. Experiment shows that the character segmentation accuracy has achieved 99%.

关键词：计算机应用中文信息处理文本切割字符切割字符识别维吾尔文

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

印刷维吾尔文本切割被引量：17

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

印刷维吾尔文本切割 被引量：17

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

印刷维吾尔文本切割被引量：17