一种基于HMM和统计语言模型的维吾尔文及阿拉伯文识别方法  被引量:6

UYGHUR AND ARABIC RECOGNITION METHODS BASED ON HMM AND STATISTICAL LANGUAGE MODEL

在线阅读下载全文

作  者:努尔艾力·喀迪尔[1] 彭良瑞[1] 哈力木拉提[2] 

机构地区:[1]清华大学电子工程系,清华信息科学与技术国家实验室,北京100084 [2]新疆大学信息科学与工程学院,新疆乌鲁木齐830046

出  处:《计算机应用与软件》2015年第1期171-174,共4页Computer Applications and Software

基  金:国家自然科学基金项目(61032008;61261130590;61163031;60872086;60863009)

摘  要:维吾尔文和阿拉伯文是采用阿拉伯文字母的从右向左书写的连写文字。它们识别方法的研究对于多文种文本图像内容的利用具有重要意义。利用HTK工具包,分别建立基于隐马尔科夫模型HMM(Hidden Markov Model)的印刷体维吾尔文和阿拉伯文识别系统,其中特征提取部分采用分布密度特征和局部方向特征。研究利用HTK工具建立维吾尔文和阿拉伯文统计语言模型,并将语言模型用于改进识别系统性能。实验结果表明采用统计语言模型可有效提高文字识别系统性能。其中,在包含24 000个单词的印刷体维吾尔文测试集上,通过利用语言模型识别率从78.28%提高到97.45%;在包含759个单词的印刷体阿拉伯文测试集上,通过利用语言模型识别率从79.07%提高到85.80%。Uyghur and Arabic languages are the cursive characters using Arabic letters and written from right to left. The study on their recognition methods is of great significance to the use of the content in multilingual texts and images. We establish in the paper the recognition systems for printed Uyghur and Arabic text and images respectively based on hidden Markov model (HMM) by using HTK tools. In it the features extraction component adopts distribution density features and local directional features. In this paper, we also study to build statistical language models of Uyghur and Arabic respectively by using HTK tools as well, and apply the language models to improving the performance of recognition systems. Experimental results demonstrate that the use of statistical language models can effectively improve the performance of characters recognition system. Among them, on the test set of printed Uyghur containing 24 000 words, the recognition rate increases from 78.28% to 97.45% by using language model, and on test set of printed Arabic containing 759 words, the recognition rate increases from 79.07% to 85.80% by using language model.

关 键 词:隐马尔科夫模型 统计语言模型 维吾尔文 阿拉伯文 识别 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象