检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]云南大学信息学院,云南昆明
出 处:《计算机科学与应用》2018年第11期1682-1688,共7页Computer Science and Application
基 金:国家自然科学基金项目(61262068)资助.
摘 要:缅甸语与英语以及其它西方语言不同,它的词之间没有明显的边界,开发缅甸语的语音合成系统时,分词是其中的一个重要环节。我们从大约600 M的原始语料库中选取5000个完整句子,由缅语专家人工分词以后作为该文的实验数据集。本文对比了基于条件随机场(CRF)的缅语分词方法与基于正向最大匹配算法(FMM)的缅语分词方法,并用置信度、分词精度和分词速度评估分词性能。在本次实验中,基于CRF与FMM的缅语分词结果中置信度分别可达94.1%和84.3%,F-值分别可达93.8%和82.9%。表明,应用CRF方法实现缅语分词的效果更好,且该方法可满足开发缅语语音合成系统的要求。Unlike English and other western languages, there are no delimiters to mark word boundaries in Burmese. Therefore, word segmentation is an important part in the realization of Burmese speech synthesis. Through manually word segmentation by Burmese experts, we have constructed a Burmese text database containing 5000 sentences as experimental data of this paper. The CRF-based word segmentation method is compared with the FMM-based word segmentation method. The performance of word segmentation method was evaluated with confidence, precision and speed of segmentation. In this experiment, the confidence of the Burmese word segmentation the CRF-based and FMM-based was 94.1% and 84.3%, respectively, and the F values were 93.8% and 82.9%, respectively. It shows that the CRF method can be applied to Burmese word segmentation with better effect. We believe that this method meets the requirements for the development of the Burmese speech synthesis system.
分 类 号:TP39[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.167.79