检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:牛奉高[1] 冯世佳 黄琛 NIU Feng-gao;FENG Shi-jia;HUANG Chen(School of Mathematical Sciences,Shanxi University, Taiyuan 030006, China)
出 处:《计算机与现代化》2021年第5期66-72,共7页Computer and Modernization
基 金:山西省应用基础研究计划项目(优秀青年基金)(201801D211002);全国统计科学研究项目(2017LY04);山西省高等学校优秀成果培育项目(2019KJ004)。
摘 要:文本信息的合理表示对文本主题聚类及检索有重要作用。针对文本表示模型维度较高的问题,基于共现潜在语义向量空间模型(CLSVSM)研究惩罚性矩阵分解(PMD),利用PMD对向量进行稀疏约束,提取核心特征词,进而实现原始数据的重建;通过共现分析理论及PMD方法,深度挖掘特征词之间的语义信息,构建语义核函数(PMD_K)。将本文方法应用于文本主题聚类中,实验结果显示,PMD和PMD_K这2种方法的聚类效果均明显优于其他方法,以F值为例,PMD_K方法较以往的95%CLSVSM_K方法,F值提高了21.9%。将PMD与文本表示模型相结合,在提高了文本主题聚类的效率和精度的同时,还避免了对高维矩阵的复杂运算。Reasonable representation of text information plays an important role in text topic clustering and retrieval.Aiming at the problem of high dimension of text representation model,penalized matrix decomposition(PMD)is studied based on the co-occurrence potential semantic vector space model(CLSVSM),and the vector is sparsely constrained by PMD to extract core features,so as to realize the reconstruction of original data.Through co-occurrence analysis theory and PMD method,the semantic information between features is deeply mined and the semantic kernel function(PMD_K)is constructed.The methods proposed in this paper are applied to text topic clustering,the experimental results show that the clustering effect of PMD and PMD_K is obviously better than that of other methods.Taking the F value as an example,the F value of PMD_K method is 21.9%higher than that of the previous 95%CLSVSM_K method.Combining PMD with text representation model not only improves the efficiency and accuracy of text topic clustering,but also avoids the complex computation of high-dimensional matrix.
关 键 词:CLSVSM 惩罚性矩阵分解 语义核函数 文本主题聚类
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3