检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:马晓宁[1] 赵东阁 MA Xiao-ning;ZHAO Dong-ge(College of Computer Science and Technology,Civil Aviation University of China,Tianjin 300300,China)
机构地区:[1]中国民航大学计算机科学与技术学院,天津300300
出 处:《计算机工程与设计》2023年第8期2516-2522,共7页Computer Engineering and Design
基 金:天津市教委科研计划基金项目(2019KJ127)。
摘 要:针对海量民航信息提取有效实体问题,仍然存在专用名词边界不准确,无法充分提取句意现象,提出一种基于预训练与模型融合的深度学习模型。使用自编码语言模型BERT进行语义编码,得到字向量,同时和长短期记忆网络与条件随机场组合得到两个基础模型。实验验证在自行标注的样本情况下,使用基础模型加权融合后得到实体标签,相对其它方法,F1值得到显著提升,很好解决了民航等专业领域存在的边界不明显以及小样本情况下模型过拟合问题。Aiming at the problem of extracting effective entities from massive civil aviation information,there is still the pheno-menon that the boundary of special terms is inaccurate and the sentence meaning can not be fully extracted,to solve the problem,a deep learning model based on pre training and model fusion was proposed.The self-coding language model BERT was used for semantic coding to obtain the word vector,at the same time,two basic models were obtained by combining long-term and short-term memory networks with conditional random fields.Experimental results show that in the case of self-labeled samples,the entity label is obtained after weighted fusion of the basic model.Compared with other methods,the F1 value is significantly improved,which well solves the problems of unclear boundary in civil aviation and other professional fields,as well as over fitting in the case of small samples.
关 键 词:预训练 命名实体识别 深度学习 民航信息 集成学习 过拟合 语义编码
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222