基于关键实体和文本摘要多特征融合的话题匹配算法  被引量:1

Topic Matching Algorithm Based on Multi-feature Fusion of Key Entities and Text Abstracts

在线阅读下载全文

作  者:纪科 张秀 马坤[1,2] 孙润元[1,2] 陈贞翔[1,2] 邬俊[3] JI Ke;ZHANG Xiu;MA Kun;SUN Runyuan;CHEN Zhenxiang;WU Jun(School of Information Science and Engineering,University of Jinan,Jinan 250022,China;Shandong Provincial Key Laboratory of Network Based Intelligent Computing,University of Jinan,Jinan 250022,China;School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China)

机构地区:[1]济南大学信息科学与工程学院,山东济南250022 [2]济南大学山东省网络环境智能计算技术重点实验室,山东济南250022 [3]北京交通大学计算机与信息技术学院,北京100044

出  处:《郑州大学学报(工学版)》2024年第2期51-59,共9页Journal of Zhengzhou University(Engineering Science)

基  金:国家自然科学基金资助项目(61702216,61772231);山东省重大科技创新工程项目(2021CXGC010103)。

摘  要:随着网络的快速普及,互联网新闻的数量剧增,在这种情况下,如何有效地找到更加符合特定主题的相关报道成为一个迫切需要解决的问题。针对这一问题,提出了基于关键实体和文本摘要多特征融合的话题匹配算法。首先,使用W2NER模型进行命名实体识别,通过词频、TF-IDF、词的合群性、词词相似度和词句相似度特征,提取关键的实体。其次,使用Pegasus模型进行文本摘要,通过BiLSTM融合关键实体特征与文本摘要特征,得到新闻文本的深层次语义特征。再次,使用交叉注意力机制对待匹配新闻进行特征交互,增进彼此的联系。最后,融合新闻文本的深层次语义特征和文本交互特征,共同参与文本话题匹配的判断。在来自于搜狐的真实数据上进行了不同算法的对比实验,结果表明:所提算法准确率和精确率均与其他算法效果相近,召回率和F1值均有所提升。With the rapid popularization of the Internet,the amount of Internet news has increased dramatically.In this case,how to effectively find relevant reports that are more in line with a specific topic has become an urgent problem to be solved.To address this issue,a topic matching algorithm based on the fusion of key entities and text abstracts was proposed in this study.Firstly,the W^(2)NER model was used for named entity recognition to extract key entities using features such as word frequency,TF-IDF,lexical cohesion word-word similarity,and word-sentence similarity.Secondly,the Pegasus model was used for text summarization,and the deep semantic features of news texts were obtained by combining the key entity features with the text summary features using BiLSTM.Next,the cross-attention mechanism was employed to enhance the interaction between the matching news articles by performing feature interaction.Finally,the deep semantic features of the news texts and the text interaction features were fused together to participate in the determination of text topic matching.Comparative experiments were conducted on real data from Sohu,and the results showed that the proposed algorithm achieved similar accuracy and precision compared to other algorithms,while recall and F1 score were improved.

关 键 词:话题匹配 关键实体 文本摘要 文本匹配 信息检索 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象