嵌套删失数据期望最大化的高斯混合聚类算法  被引量:5

Adapted Expectation Maximization Algorithm for Gaussian Mixture Clustering With Censored Data

在线阅读下载全文

作  者:余海燕 陈京京[1] 邱航 王永[1] 王若凡 YU Hai-Yan;CHEN Jing-Jing;QIU Hang;WANG Yong;WANG Ruo-Fan(Chongqing Key Laboratory of Electronic Commerce and Mod-ern Logistics,Chongqing University of Posts and Telecomms.,Ch-ongqing 404615;School of Computer Science and Engineering,University of Electronic Science and Technology,Chengdu 611731;Big Data Research Center,University of Electronic Science and Technology,Chengdu 611731;School of Information Technology Engineering,Tianjin University of Technology a nd Education,Tianjin 300222)

机构地区:[1]重庆邮电大学电子商务与现代物流重庆市重点实验室,重庆404615 [2]电子科技大学计算机科学与工程学院,成都611731 [3]电子科技大学大数据研究中心,成都611731 [4]天津职业技术师范大学信息技术工程学院,天津300222

出  处:《自动化学报》2021年第6期1302-1314,共13页Acta Automatica Sinica

基  金:国家自然科学基金(71601026,61601331,71571105);重庆市产业类重大主题专项(cstc2017zdcy-zdzxX0013);四川省重点研发项目(2018SZ0114,2019YFS0271);天津市自然科学基金青年项目(18JCQNJC04700)资助。

摘  要:针对聚类问题中的非随机性缺失数据,本文基于高斯混合聚类模型,分析了删失型数据期望最大化算法的有效性,并揭示了删失数据似然函数对模型算法的作用机制.从赤池弘次信息准则、信息散度等指标,比较了所提出方法与标准的期望最大化算法的优劣性.通过删失数据划分及指示变量,推导了聚类模型参数后验概率及似然函数,调整了参数截尾正态函数的一阶和二阶估计量.并根据估计算法的有效性理论,通过关于得分向量期望的方程得出算法估计的最优参数.对于同一删失数据集,所提出的聚类算法对数据聚类中心估计更精准.实验结果证实了所提出算法在高斯混合聚类的性能上优于标准的随机性缺失数据期望最大化算法.To provide a solution for clustering with data of missing not at random, this paper provided the efficiency analysis on the adapted expectation-maximization(EM) algorithm for Gaussian mixture clustering model with censored data. We also revealed the impact mechanism of the likelihood function of censored data on the clustering model and its estimation algorithm. With Akaike′s information criterion and Kullback-Leibler divergence,the performance of the proposed algorithm was compared with the standard EM algorithm. Based on data partition and the indicating variables of the censored data set, the paper proposed derived the posterior and likelihood function of the parameters, and adjusted its first and second moments of the truncated normal functions. According to the principles of efficient influence function, the optimal parameters of the algorithm are obtained by the equation of the expectation of the score vector. For the censored data, the proposed clustering algorithm is more accurate in estimating its centroids. The experimental results demonstrated that the proposed algorithm in Gaussian mixture clustering outperformed the standard EM algorithm, which was designed for the data of missing at random.

关 键 词:高斯混合聚类 删失数据 期望最大化算法 截尾正态函数 二阶估计量 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象