基于知识蒸馏与模型集成的事件论元抽取方法被引量：1

Event Argument Extraction Method Based on Knowledge Distillation and Model Ensemble

作　　者：王士浩王中卿[1] 李寿山[1] 周国栋[1] WANG Shihao;WANG Zhongqing;LI Shoushan;ZHOU Guodong(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China)

机构地区：[1]苏州大学计算机科学与技术学院,江苏苏州215006

出　　处：《计算机工程》2022年第7期97-103,共7页Computer Engineering

基　　金：国家自然科学基金(61806137,61702518);江苏省高等学校自然科学研究面上项目(18KJB520043)。

摘　　要：目前先进的事件论元抽取方法通常使用BERT模型作为编码器,但BERT巨大的参数量会降低效率,使模型无法在计算资源有限的设备中运行。提出一种新的事件论元抽取方法,将事件论元抽取教师模型蒸馏到2个不同的学生模型中,再对2个学生模型进行集成。构造使用BERT模型和图卷积神经网络的事件论元抽取教师模型,以及2个分别使用单层卷积神经网络和单层长短期记忆网络的学生模型。先通过均方误差损失函数对学生模型和教师模型的中间层向量进行知识蒸馏,再对分类层进行知识蒸馏,使用均方误差损失函数和交叉熵损失函数让学生模型学习教师模型分类层的知识和真实标签的知识。在此基础上,利用加权平均的方法对2个学生模型进行集成,从而提升事件论元抽取性能。使用ACE2005英文数据集进行实验,结果表明,与学生模型相比,该方法可使事件论元抽取F1值平均提升5.05个百分点,推理时间和参数量较教师模型减少90.85%和99.25%。Existing advanced event argument extraction methods focus on model’s performance and ignore model’s size and efficiency.These models exist problems of high computation cost and high delay.To address these problems,this paper proposes Event Argument Extraction method via knowledge Distillation and model Ensemble(EAEDE).The event argument extraction teacher model is distilled into two different student models,and then ensemble the student models.Firstly,a event argument extraction teacher model using BERT and graph Convolution Neural Network(CNN)is constructed,and then two student models using Long Short-Term Memory network(LSTM)and CNN respectively are constructed.During the distilling process,the student models learn the middle hidden of teacher model,and then learn the logits of teacher model.The Mean Square Error(MSE)loss function and Cross Entropy(CE)loss function are used to let students learn the knowledge of the teacher’s model classification layer and the knowledge of the real label.Finally,the weighted average method is used to ensemble the two student models to get the final model.The experiments using ACE2005 dataset show that this method improves the event argument extraction performance of student models by an average of 5.05 percentage points,while reduces the infer time by 90.85% and reduces the size of model by99.25%,comparing with the teacher model.

关键词：事件论元抽取知识蒸馏模型集成预训练语言模型模型压缩

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于知识蒸馏与模型集成的事件论元抽取方法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于知识蒸馏与模型集成的事件论元抽取方法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于知识蒸馏与模型集成的事件论元抽取方法被引量：1