基于混合字词网格的汉语音字转换问题的求解  被引量:5

Solving the Pinyin-to-Chinese-Character Conversion Problem Based on Hybrid Word Lattice

在线阅读下载全文

作  者:章森[1] 

机构地区:[1]北京工业大学信息与计算科学实验室,北京100022

出  处:《计算机学报》2007年第7期1145-1153,共9页Chinese Journal of Computers

基  金:本课题得到国家自然科学基金(60572125)资助

摘  要:汉语音字转换是中文键盘输入、汉语语音识别和中文信息处理的基础,也是一个非常具有挑战性的问题.文中分析了汉语音字转换的研究现状和存在的问题,提出了基于混合字词网格的汉语音字转换方法,给出了系统实现的架构,研究了混合2-gram模型的有关问题以及字词网格的求解算法,最后讨论了自动预测与系统学习功能的实现.在此基础上设计了原型系统并与Windows XP上的微软拼音输入系统进行了比较,在拼音到汉字的自动转换正确率方面有显著的提高.The research and development of the Pinyin-to-Chinese-Character conversion is the core technique of Chinese Input system, Chinese speech recognition and Chinese information processing. First, the state-of-the-art of Pinyin-to-Chinese-Character conversion is briefly discussed, and its principles and shortcomings are analyzed. Then the conversion approach based on hybrid word lattice is proposed. The implementation of the main architecture is studied. The related problems with hybrid language model and the algorithms to solve the word lattice are investigated. Finally, the automatic prediction algorithm and the machine learning technology used in Chinese intelligent input systems are discussed. A prototype system realized based on the proposed approach is presented, and compared with the MS Pinyin input system in Windows XP. The experimental results show that the correct conversion rate from Pinyin to Chinese characters is significantly improved.

关 键 词:汉语音字转换 N-GRAM语言模型 MARKOV模型 字词网格 用户行为 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象