检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]河海大学土木与交通学院,江苏南京210098
出 处:《交通科学与工程》2014年第4期77-82,共6页Journal of Transport Science and Engineering
基 金:江苏省自然科学基金项目(BK2011745)
摘 要:针对拥挤数据分布不平衡问题,提出了一种新的重采样方法——交叉组合重采样法。该方法是将随机向下采样法与smote法相结合,对原始数据进行交叉采样,以减少采样法对原始数据的非均匀性破坏。通过仿真,得到比例为1∶10.1的非拥挤数据和拥挤数据原始样本。根据实际情况,通过交叉采样法,分别得到类比例为1∶5,1∶3以及1∶1的数据集,并对3种情况下的分类结果进行对比分析。选择朴素贝叶斯分类器、贝叶斯网络分类器及神经网络分类器,在不同比例数据集下,针对交叉组合重采样法和一般组合重采样法进行对比实验。实验结果证明:交叉组合重采样法能够更好地解决拥挤数据不平衡给分类器带来的问题。A new re-sampling method is paccording to the problems of crowded data dis-tribution imbalance-cross combinations resample method,which combines random sam-pling method downwards and smote method.The cross-sampling method is taken to deal with the original data and the damage of the original data caused by sampling meth-od is reduced in homogeneity.Non-crowding and congestion data sample data with the ratio of approximately 1∶10.1 is obtained through simulation.According to the actual situation,the data with the ratio of 1∶5 ,1∶3 and 1∶1 could be received with the meth-od of cross combinations resample,and the classification results are compared and ana-lyzed in these three cases.Finally,cross combinations resample method and common combinations resample method are compared in the case of different ratios with the naive Bayes classifier,and bayesian network classifiers and neural network classifiers are done.Through experimental verification,it is proved that the cross combinations resam-ple method could better solve the congestion data imbalance problem which brings to the classifier.
关 键 词:拥挤识别 不平衡分类 重采样方法 交叉组合 分类器
分 类 号:U491.265[交通运输工程—交通运输规划与管理]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.4