检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:马春来[1] 单洪[1] 马涛[1,2] 史英春[1]
机构地区:[1]电子工程学院,合肥230037 [2]通信信息控制和安全技术重点实验室,浙江嘉兴314001
出 处:《小型微型计算机系统》2016年第12期2708-2712,共5页Journal of Chinese Computer Systems
基 金:国防重点实验室基金项目(9140C130104)资助
摘 要:根据LBS用户位置信息对用户社会关系进行推断,是基于位置大数据的情报挖掘领域中的一个新兴问题,可为群体发现及社团划分提供信息支撑.本文以时空共现理论为依据,对时空共现区的4类特征进行了选择、归纳及优化.针对随机森林难以对高维且含有冗余特征的数据进行分类的问题,提出一种基于特征空间分区采样策略的随机森林算法.该算法以Fisher比对特征的重要程度进行度量,并以此为依据对特征子空间分区,然后按比例进行采样,最后构造随机森林.这一改进有效避免了随机采样法构造特征子空间时容易引入噪声的问题.实验结果表明,相比于标准的随机森林算法,改进算法在对具有高维、冗余特征的数据分类中更为有效,更加适合应用于对LBS用户社会关系的推断.Inferring social ties from the location information of LBS users, which can provide more information for group discovery and community detection,is now becoming a new problem in intelligence mining from location big data. Based on theory of co-occur- rences, four categories of features of co-occurrences region are selected, inducted and optimized. Moreover, for the problem that it is difficult for Random Forests to handle high-dimensional data with redundancy features, an improved Random Forests based on feature space stratified sampling strategy is proposed in the paper. Fisher ratio which is selected to measure the importance of features in the algorithm is regarded as the basis for feature subspace partition when proportionally sampling. And random forest is created after that. The problem that noise is introduced easily when the subspace is constructed using random sampling method is avoided effectively with the improved algorithm. The experiment results show that it is more effective for the improved algorithm to classify high dimen- sion data with redundant features. So,it is more suitable for social ties inferring of LBS users.
关 键 词:基于位置的服务 时空共现 随机森林 分区采样 社会关系推断
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28