结合对比学习的新闻文本与评论相似度计算  

Similarity Calculation of News Texts and Comments Combined with Contrastive Learning

在线阅读下载全文

作  者:王红斌[1] 张卓 赖华 WANG Hong-bin;ZHANG Zhuo;LAI Hua(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China;Yunnan Key Laboratory of Computer Technology Application,Kunming University of Science and Technology,Kunming 650500,China)

机构地区:[1]昆明理工大学信息工程与自动化学院,昆明650500 [2]昆明理工大学云南省人工智能重点实验室,昆明650500 [3]昆明理工大学云南省计算机技术应用重点实验室,昆明650500

出  处:《小型微型计算机系统》2023年第12期2671-2677,共7页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(61966020)资助;云南省基础研究计划面上项目(CB22052C143A)资助。

摘  要:新闻文本与新闻评论相似度计算旨在筛选出与新闻文本相关的评论,而大部分评论以短文本的形式对新闻文本做出评价,因此新闻文本与评论的相似度计算本质上是长文本与短文本的相似度计算.传统长文本处理方法易导致文本信息缺失、文章主题不明确等问题,降低相似度计算的准确率.针对新闻文本与评论的长度差距,结合评论的特点,该文提出了结合对比学习的新闻文本与评论相似度计算方法,该方法通过关键词的提取实现新闻文本压缩同时减少文本的冗余信息;将关键词序列与新闻标题拼接作为新闻文本的表示;然后通过BERT预训练模型使用对比学习的方法实现文本正负例的构造;最后通过交叉熵和相对熵损失函数对预训练模型进行微调,实现文本的相似度计算.实验表明,该文提出的方法较近几年的长文本处理方法在准确率上提高了3.6%,并在中文文本相似度计算的公共数据集上也取得了较好的效果.The similarity calculation between news text and news comments aims to filter out comments related to news texts,and most comments evaluate news texts in the form of short texts,so the similarity calculation between news text and comments is essentially the similarity calculation between long text and short text.Previous long text processing methods are prone to issues such as information missing and ambiguous article themes,which will reduce the accuracy of similarity calculation.As for the length gap between news texts and comments,combined with the characteristics of comments,this paper proposes a method for calculating the similarity between news texts and comments combined with contrastive learning,which compresses news texts and reduces redundant information of texts by extracting keywords;splice the keyword sequence and the news title as the representation of the news text;then generate positive and negative text samples using the BERT pre-trained model and the contrastive learning approach;Finally,to calculate text similarity,the pre-trained model is fine-tuned using the Cross Entropy and Relative Entropy loss functions.Experiments show that the method proposed in this paper improves the accuracy by 3.6%compared with the long text processing methods in recent years,and also achieves good results in the public data set of Chinese text similarity calculation.

关 键 词:文本相似度 关键词提取 BERT 对比学习 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象