检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]山西大学计算智能与中文信息处理教育部重点实验室,太原030006 [2]山西财经大学应用数学学院,太原030006 [3]山西大学经济与管理学院,太原030006
出 处:《计算机学报》2016年第1期1-18,共18页Chinese Journal of Computers
基 金:国家自然科学基金(61432011;U1435212;71301090);国家"九七三"重点基础研究发展规划项目基金(2013CB329404);山西省高等学校创新人才支持计划(2013052006)资助
摘 要:大数据时代,相关分析因其具有可以快捷、高效地发现事物间内在关联的优势而受到广泛的关注,并有效地应用于推荐系统、商业分析、公共管理、医疗诊断等领域.面向非线性、高维性等大数据的复杂特征,结合现有相关分析方法的语义分析,文中从统计相关分析、互信息、矩阵计算、距离4个方面对大数据相关分析的现有研究成果进行了梳理.在对统计学中的经典相关分析理论进行归纳、总结的基础上,文中从大规模数据的通用性和均等性视角阐述了基于互信息的两个变量间非线性相关分析理论,从高维数据可计算的角度分析了基于矩阵计算的相关系数,从非线性、高维性数据的复杂结构方面解析了基于距离的相关系数.进一步地,该文在对已有相关分析方法进行分析与比较的基础上,围绕高维数据、多变量数据、大规模数据、增长性数据及其可计算方面探讨了大数据相关分析的研究挑战.In the big data time, correlation analysis has attracted much attention for its high- efficiency in analyzing inherent relation of things, and been effectively applied to many fields including recommender system, business analytics, public administration and medical diagnosis. Big data is usually nonlinear and high-dimensional. On the consideration of these complex characteristics and the semantic analysis for existing correlation analysis approaches, this paper gives a discussion of existing research findings of correlation analysis for big data. The discussion is analyzed from four aspects including statistical correlation analysis, mutual information, matrix calculation and distance. Based on summarizing classical correlation analysis theory in statistics, this paper firstly elaborates the nonlinear correlation analysis approaches between two stochastic variables induced by mutual information from the view of generality and equitability. Then, the correlation coefficient based on matrix calculation is analyzed in term of computability of high- dimensional data and the distance correlation is analyzed from the point of complicated formation of nonlinear and high-dimensional data. Furthermore, on the account of analyzing and comparing existing correlation analysis approaches, challenges of correlation analysis namely high dimensional data, multivariable data, large-scale data, computability. for big data are studied, incremental data and its
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229