针对无切分维吾尔文文本行识别的字符模型优化  被引量:3

Character model optimization for segmentation-free Uyghur text line recognition

在线阅读下载全文

作  者:姜志威[1] 丁晓青[1] 彭良瑞[1] 

机构地区:[1]清华大学电子工程系智能技术与系统国家重点实验室清华信息科学与技术国家实验室,北京100084

出  处:《清华大学学报(自然科学版)》2015年第8期873-877,883,共6页Journal of Tsinghua University(Science and Technology)

基  金:国家"九七三"重点基础研究项目(2013CB329403)

摘  要:基于隐含Markov模型(hidden Markov model,HMM)的无切分文本行识别方法能够利用概率图的思想,同步完成文本行图像的切分与识别,避免因字符预切分失败而导致的识别错误,但对字符模型的设计与训练要求很高,并且在多字体融合问题中难以提高模型泛化性能。该文通过分析模型状态在图像层面的聚类意义,先提出基于观测合理聚类的模型结构优化方法,再提出结构与参数相结合的字符模型优化策略,最后将其应用于多字体维吾尔文文本行的无切分识别系统。实验结果表明,该方法能够改善模型的状态分配合理性,并且在多字体融合问题中提高了模型泛化性能和状态利用效率。A text line recognition method was developed without presegmentation using a hidden Markov model(HMM)for simultaneously segmenting and recognizing text line images.The algorithm uses a probability graph to reduce recognition error from failed presegmentation results.However,the HMM design and training is complicated and the HMM generalization performance can not be easily improved in multi-font texts.Therefore,a character model optimization method with reasonably clustered observations was developed based on the most common HMM state in images.Then,a method was developed to optimize the model structure and parameters together for a multi-font Uyghur text line recognition system.Tests show that this method improves the state allocation,the generalization performance and the state efficiency of the character model for multi-font texts.

关 键 词:信息处理 文字识别 隐含Markov模型 统计学习 维吾尔文 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象