基于级联重排序的汉语音字转换  被引量:1

Chinese Pinyin-to-character Conversion Based on Cascaded Reranking

在线阅读下载全文

作  者:李鑫鑫[1,2] 王轩[1,2] 姚霖[1,3] 关键[1,3] 

机构地区:[1]哈尔滨工业大学深圳研究生院计算机应用研究中心,深圳518055 [2]深圳互联网多媒体应用技术工程实验室,深圳518055 [3]移动互联网应用安全产业公共服务平台,深圳518057

出  处:《自动化学报》2014年第4期624-634,共11页Acta Automatica Sinica

基  金:国家科技部重大科技专项(2011ZX03002-004-01);深圳市基础研究重点项目(JC201104210032A;JC201005260112A)资助~~

摘  要:N元语言模型是解决汉字音字转换问题最常用的方法.但在解析过程中,每一个新词的确定只依赖于前面的邻近词,缺乏长距离词之间的句法和语法约束.我们引入词性标注和依存句法等子模型等来加强这种约束关系,并采用两个重排序方法来利用这些子模型提供的信息:1)线性重排序方法,采用最小错误学习方法来得到各个子模型的权重,然后产生候选词序列的概率;2)采用平均感知器方法对候选词序列进行重排序,能够利用词性、依存关系等复杂特征.实验结果显示,两种方法都能有效地提高词N元语言模型的性能.而将这两种方法进行级联,即首先采用线性重排序方法,然后把产生的概率作为感知器重排序方法的初始概率时性能取得最优.The word n-gram language model is the most common approach for Chinese pinyin-to-character conver- sion. It is simple, efficient, and widely used in practice. However, in the decoding phase of the word n-gram model, the determination of a word only depends on its previous words, which lacks long distance grammatical or syntactic con- straints. In this paper, we propose two reranking approaches to solve this problem. The linear reranking approach uses minimum error learning method to combine different sub-models, which includes word and character n-gram language models, part-of-speech tagging model and dependency model. The averaged perceptron reranking approach reranks the candidates generated by word n-gram model by employing features extracted from word sequeuce, part-of-speech tags, and dependency tree. Experimental results on “Lancaster Corpus of Mandarin Chinese” and “People's Daily” show that both reranking approaches can efficiently utilize information of syntactic structures, and outperform the word n-gram model. The perceptron reranking approach which takes the probability output of linear reranking approach as initial weight achieves the best performance.

关 键 词:汉语音字转换 重排序 最小错误学习 感知器方法 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象