基于点互信息语义相似性的向量空间模型  被引量:3

Vector Space Model Based on Semantic Similarity of Point Mutual Information

在线阅读下载全文

作  者:牛奉高[1] 赵霞 徐倩丽 NIU Fenggao;ZHAO Xia;XU Qianli(School of Mathematical Sciences,Shanxi University,Taiyuan 030006,China)

机构地区:[1]山西大学数学科学学院,山西太原030006

出  处:《山西大学学报(自然科学版)》2021年第2期220-228,共9页Journal of Shanxi University(Natural Science Edition)

基  金:山西省应用基础研究计划资助项目(201801D211002);山西省高等学校优秀成果培育项目(2019KJ004);国家自然科学基金(71503151)。

摘  要:针对文本表示模型中语义信息提取不充分的问题,提出基于点互信息的CLSVSM(Co-occurrence Latent Semantic Vector Space Model)和语义增强的CLSVSM。首先利用点互信息计算关键词间的语义相似性,建立基于点互信息的CLSVSM;其次,通过潜在语义分析对关键词权重的修正,构建了语义增强的CLSVSM,改善了原模型对已有关键词权重不改变的不足。两种新模型都与CLSVSM、word2vec模型进行实验比较。结果表明,基于点互信息的CLSVSM具有与原CLSVSM相当的聚类效果,而较word2vec有更好的聚类效果;语义增强的CLSVSM的聚类精度明显优于其他模型,以F1值为例,在3个数据集上分别较CLSVSM提高了2%、9.2%和12.3%,同时该模型的聚类精度也明显优于word2vec。语义增强的CLSVSM更优的聚类效果,将有效提高信息检索、文本聚类的准确性,降低检索成本。To solve the problem concerning the insufficient extraction of semantic information in text representation model, the paper proposed point-mutual-information-based Co-occurrence Latent Semantic Vector Space Model(CLSVSM) and semantic enhanced CLSVSM.First, the point-mutual-information-based CLSVSM was established through calculating the semantic similarity between keywords by using point mutual information;second, the semantic enhanced CLSVSM was established by modifying the keywords weight using latent semantic analysis, which improved the deficiency of the original model.Both new models are compared with CLSVSM and word2vec models.The experiments show that point-mutual-information-based CLSVSM has the same clustering effect as the original CLSVSM, but it has better clustering effect than word2vec;semantic enhanced CLSVSM is significantly better than other models in terms of clustering accuracy.Taking the F1 value as an example, its values increased by 2%, 9.2%, and 12.3%respectively than CLSVSM on three datasets;moreover, it is superior to word2vec in clustering effect.At the same time, the clustering accuracy of the model is also better than that of word2vec.Semantic enhanced CLSVSM has better efficiency in document clustering.It also will effectively improve the accuracy of text clustering, information retrieval and reduce the retrieval cost.

关 键 词:CLSVSM 语义增强 点互信息 文献聚类 

分 类 号:O213.9[理学—概率论与数理统计] G354[理学—数学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象