基于主题模型的通用文本匹配方法

GENERAL TEXT MATCHING BASED ON TOPIC MODEL

作　　者：黄振业[1] 莫淦清[1] 余可曼 Huang Zhenye;Mo Ganqing;Yu Keman(School of Information Technology,Zhejiang Financial College,Hangzhou 310018,Zhejiang,China;Hangzhou Pingzhi Information Technology Co.,Ltd.,Hangzhou 310030,Zhejiang,China)

机构地区：[1]浙江金融职业学院信息技术学院,浙江杭州310018 [2]杭州平治信息技术股份有限公司,浙江杭州310030

出　　处：《计算机应用与软件》2024年第5期310-318,349,共10页Computer Applications and Software

摘　　要：检测长文本和短文本相似性的应用场景越来越多,文本对的一致性检测大多可以统一抽象成文本相似性的比较问题。该问题的难点在于短文本是零散的,从而很难判断其属于哪个领域及其背景知识,也难以引入词嵌入来解决在通用场景的具体文本匹配问题。基于这个问题,提出一种新的基于文本聚类主题模型的轻量方法,不需要利用额外的背景知识来匹配通用文本相似性。在两个经典测试样本数据集上的实验结果表明,该方法的文本相似性检测效率非常高。The similarity measurement between a long text and a short text relatively has more and more application scenarios,and the consistency judgment on these text pairs can be abstracte as a comparison problem of text similarity.The challenge is that the short text is sparse,it is difficult to determine which domain it belongs to and it is also difficult to introduce word embedding to solve the specific text matching problem in general scenarios.Aiming at this problem,this paper proposes a lightweight approach based on topic model with text clustering which can match generalized longshort texts without using extra related background knowledge.The experimental results on two typical test sample datasets show the text similarity detection efficiency of the proposed method is very high.

关键词：自然语言处理文本匹配主题模型吉布斯采样

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于主题模型的通用文本匹配方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于主题模型的通用文本匹配方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索