多字体多字号印刷维吾尔文字符识别  被引量:18

Multi-font multi-size printed Uyghur character recognition

在线阅读下载全文

作  者:王华[1] 丁晓青[1] 哈力木拉提[2] 

机构地区:[1]清华大学电子工程系,北京100084 [2]新疆大学信息学院,乌鲁木齐830046

出  处:《清华大学学报(自然科学版)》2004年第7期946-949,共4页Journal of Tsinghua University(Science and Technology)

基  金:国家自然科学基金资助项目(60241005)

摘  要:维吾尔文字符识别研究具有很高的理论价值和广阔的应用前景。该文提出一种多字体多字号印刷维吾尔文字符识别新方法:利用预分类信息将整个字符集划分为若干子集;采取两套方案,分别将输入字符归一化为32×32和24×24的点阵;提取方向线素特征,经压缩降维后,由修正二次鉴别函数完成分类,在综合可信度基础上集成识别结果;最后,利用结构的和局部的特征进行相似字鉴别。在容量为48800字符的测试集上的识别率达到99.48%,表明该方法的有效性。A Uyghur optical character recognition method was developed for multi-font multi-size printed Uyghur characters. Initially, pre-classification information is used to divide the entire character set into several subsets with two strategies employed to recognize a character. The character is first digitized on two meshes, 32×32 points and 24×24 points with the directional line element features then extracted from the two meshes. After dimensional reduction, the feature vectors are then classified using a modified quadratic discriminant function (MQDF). The recognition results produced by the two recognition strategies are integrated based on an overall confidence value. Finally, the local and structural features are selected to discriminate between similar characters. The recognition accuracy on a test set containing 48 800 characters reached 99.48%.

关 键 词:维吾尔文字符识别 方向线素特征 相似字鉴别 

分 类 号:TP391.43[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象