检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:尹春勇[1] 沈子宁 YIN Chun-yong;SHEN Zi-ning(School of Computer Science,Nanjing University of Information Science and Technology,Nanjing 210044,China)
机构地区:[1]南京信息工程大学计算机学院、网络空间安全学院,江苏南京210044
出 处:《计算机技术与发展》2024年第8期86-92,共7页Computer Technology and Development
基 金:国家自然科学基金面上项目(6177282)。
摘 要:针对文本相似度分析过程中缺乏信息传递和忽略多元语义信息而导致相似度计算结果准确率低的问题,结合双向长短期记忆网络(BiLSTM),提出一种新颖的交互式特征与多尺度特征的文本相似度模型(IF-MSF)。首先,利用BiLSTM对句子进行编码提取全局特征矩阵,分别用软注意力机制和余弦相似度对特征矩阵进行交互,以相互传递两组特征矩阵内部的语义信息。其次,加权两组交互式特征以综合所有交互信息,并利用BiLSTM对加权交互式特征和初始编码特征再编码以捕获特征之间的差异信息。再次,使用多尺度卷积提取差异信息的多元语义特征并结合通道注意力机制增强重要特征信息。最后,融合两组增强特征判断文本对是否相似。实验选取2个数据集来验证该方法,该模型F1值分别取得最高值88.15%和85.03%,优于其他方法。Aiming at the problem of low accuracy of similarity calculation results caused by lack of information transmission and neglecting multiple semantic information in the process of text similarity analysis,a novel text similarity model based on interactive features and multi-scale features was proposed by combining bidirectional long short-term memory(BiLSTM).Firstly,BiLSTM was used to encode the sentences and extract the global feature matrix,and the soft attention mechanism and cosine similarity were used to interact with the feature matrix respectively,so as to transfer the semantic information inside the two groups of feature matrices.Secondly,the two groups of interaction features were weighted to synthesize all interactive information,and BiLSTM was used to re-encode the weighted interactive features and the initial coding features to capture the difference information between the features.Thirdly,multiple semantic information of differential information were extracted by multi-scale convolution and channel attention was combined to enhance significant feature information.Finally,two sets of enhanced features were fused to judge whether the text pairs are similar.Two data sets were selected to verify the proposed method,and F1 values of the proposed model reached the highest values of 88.15%and 85.03%,which is better than that of other methods.
关 键 词:文本相似度 双向长短期记忆 交互式特征 多尺度特征 通道注意力
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.36