基于BERT的中文电子简历命名实体识别  被引量:14

Recognition of named entity in Chinese e-resume based on BERT

在线阅读下载全文

作  者:王传涛 丁林楷[1] 杨学鑫 胡琦 WANG Chuantao;DING Linkai;YANG Xuexin;HU Qi(School of Mechanical-Electronic and Vehicle Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China;Beijing Key Laboratory of Performance Guarantee on Urban Rail Transit Vehicles, Beijing University of Civil Engineering and Architecture, Beijing 100044, China;Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China)

机构地区:[1]北京建筑大学机电与车辆工程学院,北京100044 [2]北京建筑大学城市轨道交通车辆服役性能保障北京市重点实验室,北京100044 [3]中国科学院声学研究所,北京100190

出  处:《中国科技论文》2021年第7期770-775,782,共7页China Sciencepaper

基  金:国家自然科学基金资助项目(11774380);北京青年拔尖人才培育计划项目(CIT&TCD201704052)。

摘  要:针对传统基于规则的简历实体提取方法效率低、迁移能力差的问题,提出了一种基于Transformer双向编码器表示(bidirectional encoder representations from Transformers,BERT)的深度学习模型,用于识别相关命名实体。模型通过BERT对简历信息进行字符级别编码,得到基于上下文信息的字向量,通过双向长短时记忆(bidirectional long short term memory,BiLSTM)网络对生成的字向量进行特征提取,将所有可能的标签序列打分输出给条件随机场(condition random field,CRF),最终通过CRF进行解码生成实体标签序列。实验结果表明,BERT-BiLSTM-CRF模型对简历实体识别的效果优于其他传统模型,取得了最高的F1值为94.82%。Aiming at the problems of low efficiency and poor migration ability of traditional rule-based resume entity extraction method,a deep learning model based on bidirectional encoder representations from Transformers(BERT)was proposed to identify related named entities.The model could encode the character level of resume information by BERT and get the word vector based on context information.The character vector was extracted by bidirectional long short term memory(BiLSTM),and values of all possible tag sequences were exported to condition random field(CRF).Finally,the entity tag sequence was decoded through CRF.The experimental results show that BERT-BiLSTM-CRF is superior to the traditional models,which can reach the highest F1 value of 94.82%.

关 键 词:文字信息处理 电子简历 深度学习 双向长短时记忆 命名实体识别 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象