多字体印刷藏文字符识别  被引量:19

Multi-Font Printed Tibetan Character Recognition

在线阅读下载全文

作  者:王华[1] 丁晓青[1] 

机构地区:[1]清华大学电子工程系,北京100084

出  处:《中文信息学报》2003年第6期47-52,共6页Journal of Chinese Information Processing

基  金:国家自然科学基金资助项目 (6 0 2 4 10 0 5 )

摘  要:藏文字符识别系统是中文多文种信息处理系统的重要组成部分 ,但至今国内外的研究基本处于空白。本文提出了一种基于统计模式识别的多字体印刷藏文字符识别方法 :从字符轮廓中抽取方向线素特征 ,利用线性鉴别分析 (LDA)压缩降维后得到紧凑的字符特征向量。采用基于置信度分析的两级分类策略 ,设计了带偏差欧氏距离分类器 (EDD)完成高效的粗分类 ,细分类采用修正二次鉴别函数 (MQDF)。通过实验选取恰当的分类器参数后 ,在容量为 177,6 0 0字符 (30 0样本 /字符类 )的测试集上的识别率达到 99.79% 。Tibetan character recognition is a significant module of Chinese multi language information processing system,however hardly any research work has been undertaken yet. A comprehensive method based on statistical pattern recognition approach for multi font printed Tibetan character recognition is proposed. Firstly, directional line element features are extracted from the contour of input character. After feature dimension reduction by Linear Dircriminant Analysis (LDA) to formulate compact feature vector, two stage classification strategy based on confidence value is adapted to decide the category of input character. Euclidean Distance with Deviation (EDD) is designed for effective rough classification while Modified Quadratic Discriminant Function (MQDF) is employed to perform fine classification. Selecting proper classifier parameters via experiment, a recognition accuracy of 99.79% on test set containing 177,600 characters (300 samples per category) is achieved. The experimental results show the validity of proposed method.

关 键 词:人工智能 模式识别 藏文字符识别 方向线素特征 线性鉴别分析 带偏差欧氏距离 修正二次鉴别函数 

分 类 号:TP391.43[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象