面向服务聚类的短文本优化主题模型  被引量:3

Short text optimized topic model for service clustering

在线阅读下载全文

作  者:陆佳炜[1,2] 郑嘉弘 李端倪 徐俊 肖刚[1,2] LU Jia-wei;ZHENG Jia-hong;LI Duan-ni;XU Jun;XIAO Gang(College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China;College of Mechanical and Electrical Engineering,China Jiliang University,Hangzhou 310018,China)

机构地区:[1]浙江工业大学计算机科学与技术学院,浙江杭州310023 [2]中国计量大学机电工程学院,浙江杭州310018

出  处:《浙江大学学报(工学版)》2022年第12期2416-2425,2444,共11页Journal of Zhejiang University:Engineering Science

基  金:国家自然科学基金资助项目(61976193);国家社会科学基金资助项目(22BMZ038);浙江省自然科学基金资助项目(LY19F020034);浙江省重点研发计划项目(2021C03136)。

摘  要:为了获取高质量的隐式主题结果,提高服务聚类精度,解决服务描述文档文本短带来的语义稀疏性与噪声问题,提出词向量与噪声过滤优化的词对主题模型(BTM-VN).该模型以词对为基础,拓展服务描述文档,获取额外的语义信息,设计利用主题分布信息进行代表词对概率计算的策略,通过在采样过程中计算代表词对矩阵,提高代表词对在当前主题的权重,降低噪声词对服务描述文档主题获取的干扰.利用词向量筛选待训练的词对集合,减少共现意义低的词对组合,解决词对主题模型耗时较长的问题.使用优化的密度峰值聚类算法对经BTM-VN训练后的服务主题分布矩阵进行聚类.实验结果表明,基于BTM-VN的服务聚类方法在3种聚类评价指标上的表现均优于传统的服务聚类算法.A biterm topic model with word vector and noise filtering(BTM-VN)was proposed,in order to mine high-quality latent topics,improve the accuracy of service clustering,and solve sparsity and noise problems caused by the short text feature of service description documents,Based on biterms,BTM-VN expanded the service description documents and obtained additional semantic information.A strategy for calculating the probability of representative biterms based on topic distribution information was designed.By calculating a representative biterms matrix in the sampling process,the weight of the representative biterms at the current topic was improved to reduce the interference of noise words in the service description document.Moreover,word embeddings were integrated to filter the biterms,reducing the number of biterms with low co-occurrence meaning and solving the biterm-based topic model’s problem which causes high time consumption.Finally,an optimized density peak clustering algorithm was used to cluster the topic distribution matrix trained by BTM-VN.Experimental results show that,the service clustering method based on BTM-VN performs better on real-world dataset than existing methods according to three clustering evaluation metrics.

关 键 词:服务聚类 主题模型 短文本优化 代表词对 词向量 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象