基于BI-LSTM-CRF模型的维吾尔语分词研究  被引量:1

Uyghur word segmentation based on bidirectional long short term memory conditional random field model

在线阅读下载全文

作  者:孙雅婧 李成华[1] 杨斌 江小平[1] 艾提日也古丽·艾尼瓦尔 SUN Ya-jing;LI Cheng-hua;YANG Bin;JIANG Xiao-ping;ATTRYE·Anwar(School of Electronic Information Engineering,South-Central University for Nationalities,Wuhan 430070,China;School of Education,South-Central University for Nationalities,Wuhan 430070,China)

机构地区:[1]中南民族大学电子信息工程学院,湖北武汉430070 [2]中南民族大学教育学院,湖北武汉430070

出  处:《青海师范大学学报(自然科学版)》2019年第4期5-12,共8页Journal of Qinghai Normal University(Natural Science Edition)

基  金:湖北省自然科学基金项目(2017CFB874);中央高校基本科研业务费专项资助项目(CZY17001)

摘  要:在充分研究维吾尔语言形态特征的基础上,制定相应的分词规则并手工标注原始语料,建成原始语料库;针对传统机器学习分词方法过度依赖背景知识和特征选取的问题,提出了一种基于长短期记忆(LSTM)神经网络改进的双向长短时记忆条件随机场(BI-LSTM-CRF)网络模型来进行维吾尔语分词,其能够有效地使用过去和未来的输入特征.利用该分词模型与基于传统机器学习方法的条件随机场(CRF)模型对比,实验结果表明,使用BI-LSTM-CRF模型分词性能有明显提高,且具有良好的泛化能力.On the basis of fully considering the morphological characteristics of Uyghur language,this paper formulated the corresponding word segmentation rules and manually labeled the original corpus to build the original corpus.Bidirectional Long Short Term Memory Conditional Random Field(BI-LSTM-CRF)model,effectively using past and future input features,based on Long Short Term Memory(LSTM)is proposed to carry on the segmentation of Uyghur words,which is to solve the problem of excessive dependence on background knowledge and feature selection in traditional machine learning methods.What’s more,compared with the Conditional Random Fields(CRF)model based on traditional machine learning method,the experimental results show that the performance of word segmentation using BI-LSTM-CRF model is significantly improved and this model also has good generalization ability.

关 键 词:维吾尔语分词 BI-LSTM-CRF CRF 对比实验 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象