基于交互式特征与多尺度特征的文本相似度研究  

Research on Text Similarity Based on Interactive Features and Multi-scale Features

在线阅读下载全文

作  者:尹春勇[1] 沈子宁 YIN Chun-yong;SHEN Zi-ning(School of Computer Science,Nanjing University of Information Science and Technology,Nanjing 210044,China)

机构地区:[1]南京信息工程大学计算机学院、网络空间安全学院,江苏南京210044

出  处:《计算机技术与发展》2024年第8期86-92,共7页Computer Technology and Development

基  金:国家自然科学基金面上项目(6177282)。

摘  要:针对文本相似度分析过程中缺乏信息传递和忽略多元语义信息而导致相似度计算结果准确率低的问题,结合双向长短期记忆网络(BiLSTM),提出一种新颖的交互式特征与多尺度特征的文本相似度模型(IF-MSF)。首先,利用BiLSTM对句子进行编码提取全局特征矩阵,分别用软注意力机制和余弦相似度对特征矩阵进行交互,以相互传递两组特征矩阵内部的语义信息。其次,加权两组交互式特征以综合所有交互信息,并利用BiLSTM对加权交互式特征和初始编码特征再编码以捕获特征之间的差异信息。再次,使用多尺度卷积提取差异信息的多元语义特征并结合通道注意力机制增强重要特征信息。最后,融合两组增强特征判断文本对是否相似。实验选取2个数据集来验证该方法,该模型F1值分别取得最高值88.15%和85.03%,优于其他方法。Aiming at the problem of low accuracy of similarity calculation results caused by lack of information transmission and neglecting multiple semantic information in the process of text similarity analysis,a novel text similarity model based on interactive features and multi-scale features was proposed by combining bidirectional long short-term memory(BiLSTM).Firstly,BiLSTM was used to encode the sentences and extract the global feature matrix,and the soft attention mechanism and cosine similarity were used to interact with the feature matrix respectively,so as to transfer the semantic information inside the two groups of feature matrices.Secondly,the two groups of interaction features were weighted to synthesize all interactive information,and BiLSTM was used to re-encode the weighted interactive features and the initial coding features to capture the difference information between the features.Thirdly,multiple semantic information of differential information were extracted by multi-scale convolution and channel attention was combined to enhance significant feature information.Finally,two sets of enhanced features were fused to judge whether the text pairs are similar.Two data sets were selected to verify the proposed method,and F1 values of the proposed model reached the highest values of 88.15%and 85.03%,which is better than that of other methods.

关 键 词:文本相似度 双向长短期记忆 交互式特征 多尺度特征 通道注意力 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象