检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王一宾 郑伟杰 程玉胜 曹天成 Wang Yibin;Zheng Weijie;Cheng Yusheng;Cao Tiancheng(School of Computer and Information,Anqing Normal University,Anqing,246113,China;The University Key Laboratory of Intelligent Perception and Computing of Anhui Province,Anqing Normal University,Anqing,246133,China)
机构地区:[1]安庆师范大学计算机与信息学院,安庆246133 [2]安徽省高校智能感知与计算重点实验室,安庆师范大学,安庆246133
出 处:《南京大学学报(自然科学版)》2021年第1期75-89,共15页Journal of Nanjing University(Natural Science)
基 金:国家自然科学基金(617022012)。
摘 要:多标签算法大多利用特征与标签嵌入等方法挖掘标签空间的语义信息,但这类方法没有利用特征与标签间可能存在的某种联系.类属属性的提出较好地诠释了特征与标签的联系,即标签可能对应一组自身的特征,然而这类方法未能给出特征与标签间可能存在的逻辑关系,也未证实标签与实例间可能存在同样的逻辑关系.因此,提出基于PLSA(Probabilistic Latent Semantic Analysis)学习概率分布语义信息的新型多标签分类算法.首先认为样本矩阵存在一种隐含变量作为标签,利用PLSA模型获取特征⁃标签与标签⁃实例条件概率分布矩阵,以条件概率分布的形式解释它们之间可能存在的联系;其次,建立模型学习概率分布矩阵中存在的语义信息,并应用于多标签算法的标签预测与分类;最后在13个公开的多标签文本类型的数据集上进行实验与统计假设检验,并与其他多标签分类算法对比.实验结果表明,提出的学习概率分布语义信息用于提高多标签算法的性能存在一定的合理性.In multi⁃label algorithms,features and label embedding are wildly used to mine the semantic information of the label space.However,these methods do not take advantage of the possible correlation information between features and labels.In the research of multi⁃label label⁃specific features algorithms,using correlation information among labels,among features and reshaping the label space are the major methods to improve the algorithm.However,this type of method fails to give a logical relationship between the feature and the label,and whether the label and the instance may have the same logical relationship.How to use these two semantic information to improve the performance of the multi⁃label algorithm is worthy of research.Therefore,this paper proposes a new multi⁃label classification algorithm based on PLSA(Probabilistic Latent Semantic Analysis)to learn the semantic information of probability distribution.Firstly,we consider that there is a latent variable in the sample matrix as the label.The feature⁃label and label⁃instance conditional probability distribution matrices are obtained using the PLSA model,and the possible relationships of them are explained in the form of conditional probability distributions.Secondly,the model learns the semantic information existing in the probability distribution matrix and applies it to the label prediction and classification of the multi⁃label algorithm.Finally,the proposed algorithm is compared with other multi⁃label algorithms on 13 public multi⁃label text type datasets.The statistical hypothesis tests illustrate the effect of the proposed algorithm.The experimental results show that the proposed algorithm improves the performance of the multi⁃label algorithm by learning the semantic information of the probability distribution is reasonable.
分 类 号:TP39[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.216.230.65