检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李竹婷 陈秀宏[1] 孙慧强 Li Zhuting;Chen Xiuhong;Sun Huiqiang(School of Digital Media,Jiangnan University,Wuxi Jiangsu 214122,China)
机构地区:[1]江南大学数字媒体学院
出 处:《计算机应用研究》2019年第8期2415-2420,共6页Application Research of Computers
基 金:国家自然科学基金资助项目(61373055);2017年江苏省研究生科研创新计划资助项目(KYCX17_1500)
摘 要:针对已有的符号数据主成分分析法大多采用部分代表性信息来代替符号数据的缺点,提出一种直方图符号数据的主成分分析法。直方图数据以概率分布的形式表示符号数据更全面准确。根据直方图数据特点将其用分位函数表示,引入充分考虑直方图数据概率分布的Wasserstein距离,计算直方图变量协方差矩阵,从而进行主成分分析。但该方法求得的前若干个最大特征所对应的特征向量不一定为非负,这样在用分位函数表示主成分时不能保证它也是分位函数。为此,结合DSD(distribution and symmetric distribution)回归模型,对每个直方图变量定义相应的对称分布变量,根据Wasserstein距离下的广义协方差矩阵得到具有非负系数的所有主成分。通过实验说明了该算法的有效性。该方法同时克服了直方图PCA系数可能为负的缺点,更多地保留了原始数据的信息。Since the existing PCA of symbolic data mostly use some representative information instead of symbolic data, this paper proposed a histogram principal component analysis. It represented a histogram data by a quantile function with its cha-racteristic, and introduced the Wasserstein distance which fully took into account the probability distribution of the histogram data. It was easy to obtain the covariance matrix to perform the principal component analysis using this distance. However, the eigenvectors corresponding to the first m largest eigenvalues obtained by this method was not necessarily negative, so it could not guarantee that the principal components were also quantile functions when they were represented by the quantile functions. For this point, it combined the idea of DSD regression model, defined the corresponding symmetric distribution variables for each histogram variable, then obtained the non-negative principal component coefficients with the generalized covariance matrix. The experiments show the effectiveness of the algorithm. Besides, this method overcomes the disadvantage that the PCA coefficient of the histogram may be negative and retains more information of the original data.
关 键 词:主成分分析 直方图数据 分位函数 Wasserstein距离 协方差矩阵
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145