一种基于鞅差散度的纵向数据降维方法  被引量:2

Dimension Reduction for Longitudinal Data Based on Martingale Difference Divergence

在线阅读下载全文

作  者:汪红霞[1] 房丽云 卜士杰 许佩蓉 WANG Hongxia;FANG Liyun;BU Shijie;XU Peirong(School of Statistics and Data Science,Nanjing Audit University,Nanjing,211815,China;College of Mathematics and Sciences,Shanghai Normal University,Shanghai,200233,China;School of Mathematical Sciences,Shanghai Jiao Tong University,Shanghai,200240,China)

机构地区:[1]南京审计大学统计与数据科学学院,南京211815 [2]上海师范大学数理学院,上海200233 [3]上海交通大学数学科学学院,上海200240

出  处:《应用概率统计》2023年第1期132-158,共27页Chinese Journal of Applied Probability and Statistics

基  金:许佩蓉由国家自然科学基金面上项目(批准号:11971018);上海市科技启明星项目(批准号:20QA1407500)资助;国家社会科学基金一般项目(批准号:22BTJ021);江苏高校“青蓝工程”资助。

摘  要:变量间的相关性和同一个体多次观测之间的相关性是纵向数据集两大固有特点,这两种相关性包含纵向数据的许多重要信息.本文借鉴矩阵值数据的降维思想,利用这两种相关性对纵向数据进行降维,提出一种基于鞅差散度的充分维数折叠降维方法.理论上,该降维准则在总体形式下能找到中心均值维数折叠子空间,实现时间和变量两个维度的同时降维,基于其样本形式得到的中心均值维数折叠子空间的估计具有√n相合性.算法上,通过引入Kronecker乘积假定,将降维过程转化为带约束的低维优化问题,从而可以用成熟的非线性优化算法快速求解.进一步地,本文提出一种相合的BIC准则自适应地确定结构维数.相较于文献中的降维方法,数值模拟表明所提方法不仅能快速实现,而且在中心均值维数折叠子空间的估计和结构维数的确定上有更高的准确度.最后,本文通过原发性胆汁性肝硬化临床数据的实证分析验证了所提方法的有效性.Within-subject correlation and correlation among variables are two inherent characteristics of longitudinal datasets,which contain lots of important data information.In order to use these two kinds of correlation for dimension reduction,in this paper,we propose a sufficient dimension folding method based on martingale difference divergence in the spirit of dimension folding of matrix-valued data.It can be shown that the method can find the central mean dimension folding subspace in the population level,and can reduce the dimensions of both predictors and observation times simultaneously.Further,the estimated basis directions ensures the root-n consistency.To implement the proposed method,the Kronecker product assumption is introduced,so that the process can be transformed to a constrained low-dimensional optimization problem,which can be quickly solved by exisiting nonlinear optimization algorithms.Furthermore,a consistent BIC criterion is proposed to determine the structural dimension.Simulation studies show that the proposed method is efficient and can have higher accuracy on subspace estimation and structural dimension determination.Finally,an application on primary biliary cirrhosis data is used to illustrate the effectiveness of the proposed method.

关 键 词:纵向数据 鞅差散度 充分性降维 维数折叠 中心均值子空间 

分 类 号:O212.4[理学—概率论与数理统计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象