基于HBase的中文短文本快速检索方案研究

Research on a Fast Chinese Short Text Retrieval Scheme Based on HBase

作　　者：赵航尹铁源[1] ZHAO Hang;YIN Tieyuan(Shenyang University of Technology,Shenyang Liaoning 110870)

出　　处：《长江信息通信》2024年第3期125-129,共5页Changjiang Information & Communications

摘　　要：伴随着信息时代的飞速发展,生活中每个行业内需要处理的信息成倍递增。对于海量的数据在完全分布式的环境下计算和存储更为合适。但是在检索方面,对于中文短文本数据的检索任务时效率却略显不足。综上所述,文章设计了一种基于HBase的中文短文本快速检索方案。首先通过BTM训练出对应的主题概率分布。其次将传统的KNN文本分类结合潜在的中文语义分析,来实现短文本的潜在主题文本分类,并设计了基于MapReduce并行化KNN文本主题分类来解决海量数据计算量大的问题,最后将文本主题分类结果与ES上的Top Hits相结合,构建对应表的二级索引来避免对于原始文本数据复杂的全表扫描。从而实现快速检索。最后通过实验对比,这种方案比传统的HBase检索中文短数据的方案效率更高。With the rapid development of the information age,the amount of information that needs to be processed in every industry in daily life has multiplied.It is more suitable to calculate and store massive amounts of data in a fully distributed environment.However,in terms of retrieval,the efficiency of retrieval tasks for Chinese short text data is slightly insufficient.In summary,this article designs a fast Chinese short text retrieval scheme based on HBase.Firstly,the corresponding topic probability distribution is trained through BTM.Secondly,traditional KNN text classification is combined with latent semantic analysis to achieve latent topic classification of short texts.Finally,combine the text topic classification results with the Top Hits on ES to construct a secondary index for the corresponding table to avoid complex full table scans of the original text data.Thus achieving fast retrieval.Finally,through experimental comparison,this scheme is more efficient than the traditional HBase scheme for retrieving Chinese short data.

关键词：HBASE BTM KNN 文本分类

分类号：TP393[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于HBase的中文短文本快速检索方案研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于HBase的中文短文本快速检索方案研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索