基于伪标签和迁移学习的双关语识别方法  

Pun detection basd on pseudo-label and transfer learning

在线阅读下载全文

作  者:姜思羽 张智恒 姜立标 马乐 陈博远 王连喜 赵亮 JIANG Siyu;ZHANG Zhiheng;JIANG Libiao;MA Le;CHEN Boyuan;WANG Lianxi;ZHAO Liang(School of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou 510006,P.R.China;School of Software,South China University of Technology,Guangzhou 510000,P.R.China;School of Mechanical and Automotive Engineering,South China University of Technology,Guangzhou 510000,P.R.China;School of Mechanical Engineering,Guangzhou City University of Technology,Guangzhou 510800,P.R.China;Engineering Research Institute,Guangzhou City University of Technology,Guangzhou 510800,P.R.China;College of Further Education,Guangdong Industry Polytechnic,Guangzhou 510300,P.R.China)

机构地区:[1]广东外语外贸大学信息科学与技术学院,广州510006 [2]华南理工大学软件学院 [3]华南理工大学机械与汽车工程学院,广州510000 [4]广州城市理工学院机械工程学院,广州510800 [5]广州城市理工学院工程研究院,广州510800 [6]广东轻工职业技术学院继续教育学院,广州510300

出  处:《重庆大学学报》2024年第2期51-61,共11页Journal of Chongqing University

基  金:广州市科技计划资助项目(202102020637,202002030227);广东外语外贸大学师生合作资助项目(21SS10)。

摘  要:针对双关语样本短缺问题,研究提出了基于伪标签和迁移学习的双关语识别模型(pun detection based on Pseudo-label and transfer learning)。该模型利用上下文语义、音素向量和注意力机制生成伪标签;然后,迁移学习和置信度结合挑选可用的伪标签;最后,将伪标签数据和真实数据混合到网络中进行训练,重复伪标签标记和混合训练过程。一定程度上解决了双关语样本量少且获取困难的问题。使用该模型在SemEval 2017 shared task 7以及Pun of the Day数据集上进行双关语检测实验,结果表明模型性能均优于现有主流双关语识别方法。To address the problem of shortage of the pun samples,this paper proposes a pun recognition model based on pseudo-label speech-focused context(pun detection based on pseudo-label and transfer learning).Firstly,the model uses contextual semantics,phoneme vector and attention mechanism to generate pseudo-labels.Then,it combines transfer learning and confidence to select useful pseudo-labels.Finally,the pseudo-label data and real data are used for network theory and training,and the pseudo-label labeling and mixed training procedures are repeated.To a certain extent,the problem of small sample size and difficulty in obtaining puns has been solved.By this model,we carry out pun detection experiments on both the SemEval 2017 shared task 7 dataset and the Pun of the Day dataset.The results show that the performance of this model is better than that of the existing mainstream pun recognition methods.

关 键 词:双关语检测 伪标签 迁移学习 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象