基于ELMo-BiLSTM-CRF模型的中文地址分词  被引量:5

Chinese address segmentation based on ELMo-BiLSTM-CRF model

在线阅读下载全文

作  者:余俊 于文年 彭艳兵 YU Jun;YU Wennian;PENG Yanbing(Wuhan Research Institude of Posts and Telecommunications,Wuhan 430070,China;Nanjing FiberHome Tiandi Co.,Ltd.,Nanjing 210019,China)

机构地区:[1]武汉邮电科学研究院,湖北武汉430070 [2]南京烽火天地通信科技有限公司,江苏南京210019

出  处:《电子设计工程》2021年第20期72-76,共5页Electronic Design Engineering

摘  要:为了解决传统基于规则的方法在处理中文地址分词的过程中,存在的分词效率低、需要人工维护字典且对中文地址中有歧义的字段无法正常解析等问题,文中提出在对中文地址分词的过程中,采用ELMo预训练模型的方式和嵌套BiLSTM-CRF的方法提升整体分词效率。该模型考虑到ELMo模型生成的词向量与上下文有关,BiLSTM能够有效解决输入序列的特征提取,且CRF可以通过状态转移矩阵进行训练优化。采用自建的训练样本集对模型进行训练时,分别运用了ELMo-BiLSTM-CRF、BiLSTM-CRF以及BiLSTM,并进行对比。结果表明,ELMo-BiLSTM-CRF模型的分词效果更佳,具有更高的准确率。In order to solve the problems existing in the traditional rule-based method,such as low efficiency of word segmentation,manual maintenance of dictionaries,and improper parsing of ambiguous fields in Chinese addresses,etc.This paper proposes to improve the overall word segmentation efficiency by adopting ELMo pre-training model and nested BiLSTM-CRF method in Chinese address segmentation.The model considers that the word vectors generated by ELMo model are context-dependent,BiLSTM can effectively solve the feature extraction of input sequences,and CRF can be optimized through the state transition matrix.With the self-built training sample set,ELMo-BiLSTM-CRF,BiLSTM-CRF and BiLSTM were respectively used in the training of the model and compared.The results showed that the ELMo-BiLSTM-CRF model had better word segmentation effect and higher accuracy.

关 键 词:中文地址 中文地址分词 ELMo-BiLSTM-CRF模型 预训练模型 

分 类 号:TN927.2[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象