动态查询窗口引导的回复关系发现方法被引量：1

The Method for Identifying Reply-to Relation Guided by Dynamic Inquiry Window

作　　者：张竞文崔诗尧张兴华苏涛宇柳厅文[1,2] ZHANG Jingwen;CUI Shiyao;ZHANG Xinghua;SU Taoyu;LIU Tingwen(Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China;School of Cyber Security,University of Chinese Academy of Sciences,Beijing 101408,China)

机构地区：[1]中国科学院信息工程研究所,北京100093 [2]中国科学院大学网络空间安全学院,北京101408

出　　处：《集成技术》2024年第5期53-63,共11页Journal of Integration Technology

基　　金：国家重点研发计划项目(2021YFB3100600)。

摘　　要：在多方会话中,判断消息之间的回复关系是对话领域的一项重要任务。现有的相关工作还未关注、解决以下两个数据分布方面的问题:长度较短的消息往往出现的频率更高,而短文本包含的语义信息较少,限制了模型的学习能力;存在回复关系的正样本数量往往远少于负样本数量,导致模型在训练过程中容易出现数据偏斜问题,降低了模型处理正样本的性能。针对上述两个问题,作者提出一个基于预训练语言模型的改进模型,首先通过动态查询窗口建模缓解短文本相关问题;然后通过位置驱动的正样本权重优化缓解正样本相关问题。与前人研究工作进行比对,实验结果表明,与基于预训练语言模型的基线模型相比,改进模型将召回率平均提升了15.7%。此外,还构建了一个采集自Telegram平台的新数据集,可为后续相关研究提供数据支持。In multi-party conversations,identifying the reply-to relation between messages is an important task in the dialogue domain.Existing efforts have not addressed the following two issues related to data distribution:shorter messages tend to appear more frequently,while shorter texts contain less semantic information,which limits the learning ability of the model;the number of positive samples with reply-to relation is often much less than the number of negative samples,leading to data skewness issue during training phase and reducing the model’s performance in processing positive samples.Aiming at the two issues,this paper proposes an improved model based on a pre-trained language model,which firstly mitigates the short text-related issue through dynamic inquiry window modeling;and then copes with the positive sample-related issue through position-driven positive sample weight optimization.The paper is compared with previous research,and the experimental results show that this paper’s work improves the recall metric by an average of 15.7%compared to the baseline model based on the pre-trained language model.In addition,this paper constructs a new dataset collected from the Telegram platform,which can provide data support for subsequent related studies.

关键词：多方对话回复关系发现查询窗口数据分布预训练语言模型

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

动态查询窗口引导的回复关系发现方法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

动态查询窗口引导的回复关系发现方法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

动态查询窗口引导的回复关系发现方法被引量：1