缅甸语分词方法及其实现  被引量:1

Burmese Segmentation Methods and Its Implementation

在线阅读下载全文

作  者:马昌娥 杨鉴[1] 

机构地区:[1]云南大学信息学院,云南昆明

出  处:《计算机科学与应用》2018年第11期1682-1688,共7页Computer Science and Application

基  金:国家自然科学基金项目(61262068)资助.

摘  要:缅甸语与英语以及其它西方语言不同,它的词之间没有明显的边界,开发缅甸语的语音合成系统时,分词是其中的一个重要环节。我们从大约600 M的原始语料库中选取5000个完整句子,由缅语专家人工分词以后作为该文的实验数据集。本文对比了基于条件随机场(CRF)的缅语分词方法与基于正向最大匹配算法(FMM)的缅语分词方法,并用置信度、分词精度和分词速度评估分词性能。在本次实验中,基于CRF与FMM的缅语分词结果中置信度分别可达94.1%和84.3%,F-值分别可达93.8%和82.9%。表明,应用CRF方法实现缅语分词的效果更好,且该方法可满足开发缅语语音合成系统的要求。Unlike English and other western languages, there are no delimiters to mark word boundaries in Burmese. Therefore, word segmentation is an important part in the realization of Burmese speech synthesis. Through manually word segmentation by Burmese experts, we have constructed a Burmese text database containing 5000 sentences as experimental data of this paper. The CRF-based word segmentation method is compared with the FMM-based word segmentation method. The performance of word segmentation method was evaluated with confidence, precision and speed of segmentation. In this experiment, the confidence of the Burmese word segmentation the CRF-based and FMM-based was 94.1% and 84.3%, respectively, and the F values were 93.8% and 82.9%, respectively. It shows that the CRF method can be applied to Burmese word segmentation with better effect. We believe that this method meets the requirements for the development of the Burmese speech synthesis system.

关 键 词:缅甸语 分词 条件随机场 正向最大匹配算法 

分 类 号:TP39[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象