基于多模型融合的民航领域实体抽取方法  被引量:1

Entity extraction method in civil aviation field based on multi-model fusion

在线阅读下载全文

作  者:马晓宁[1] 赵东阁 MA Xiao-ning;ZHAO Dong-ge(College of Computer Science and Technology,Civil Aviation University of China,Tianjin 300300,China)

机构地区:[1]中国民航大学计算机科学与技术学院,天津300300

出  处:《计算机工程与设计》2023年第8期2516-2522,共7页Computer Engineering and Design

基  金:天津市教委科研计划基金项目(2019KJ127)。

摘  要:针对海量民航信息提取有效实体问题,仍然存在专用名词边界不准确,无法充分提取句意现象,提出一种基于预训练与模型融合的深度学习模型。使用自编码语言模型BERT进行语义编码,得到字向量,同时和长短期记忆网络与条件随机场组合得到两个基础模型。实验验证在自行标注的样本情况下,使用基础模型加权融合后得到实体标签,相对其它方法,F1值得到显著提升,很好解决了民航等专业领域存在的边界不明显以及小样本情况下模型过拟合问题。Aiming at the problem of extracting effective entities from massive civil aviation information,there is still the pheno-menon that the boundary of special terms is inaccurate and the sentence meaning can not be fully extracted,to solve the problem,a deep learning model based on pre training and model fusion was proposed.The self-coding language model BERT was used for semantic coding to obtain the word vector,at the same time,two basic models were obtained by combining long-term and short-term memory networks with conditional random fields.Experimental results show that in the case of self-labeled samples,the entity label is obtained after weighted fusion of the basic model.Compared with other methods,the F1 value is significantly improved,which well solves the problems of unclear boundary in civil aviation and other professional fields,as well as over fitting in the case of small samples.

关 键 词:预训练 命名实体识别 深度学习 民航信息 集成学习 过拟合 语义编码 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象