基于随机森林的公路隧道运营缺失数据插补方法  被引量:11

Random Forest Based Operational Missing Data Imputation for Highway Tunnel

在线阅读下载全文

作  者:钱超[1] 陈建勋[2] 罗彦斌[2] 代亮[1] 

机构地区:[1]长安大学电子与控制工程学院,西安710064 [2]长安大学公路学院,西安710064

出  处:《交通运输系统工程与信息》2016年第3期81-87,共7页Journal of Transportation Systems Engineering and Information Technology

基  金:973计划项目(2013CB036003);国家自然科学基金项目(51408054);中央高校基本科研业务费专项资金项目(310832161006)~~

摘  要:对隧道内环境、交通状态等各类运营数据的实时、完整获取并深入挖掘,是提高应急处置能力、实现运营安全预警的基础.提出一种基于随机森林的缺失数据插补方法,根据缺失特征对缺失数据集进行分割;建立随机森林回归模型进行迭代插补并确定迭代终止条件;以标准均方根误差最小确定了随机森林中决策树的数量和分裂节点随机抽取变量数的最优组合.对公路隧道运营缺失数据集插补结果表明:本方法插补精度高、鲁棒性好,与KNN、SVD、MICE和PPCA等插补方法相比,标准均方根误差降低25%以上;利用并行运算大幅度提高了插补效率,弥补了插补速度慢的缺陷,保证了插补的有效性和时效性.Real-time & completely accessing and deeply mining of tunnel operational data such as environment state and traffic status is a foundation work to improve emergency response capacity and realize safety early warning. An imputation method is proposed based on Random Forest algorithm. Missing data set is separated according to missing features. Random Forest regression model is built to iteratively impute after the determination of stopping criterion. The optimal combination of decision tree numbers and variables numbers randomly sampled at each split in Random Forest are identified by taking the minimum normalized root mean square error as objective function. Imputation results on highway tunnel operational missing data indicate that the method provides significantly higher precision and better robustness than KNN, SVD, MICE, PPCA, reducing normalized root mean square error by at least 25%. Moreover, the imputation efficiency is improved significantly by using parallel computation. It covers the shortage of slow imputation speed and provides a warranty of effectiveness and timeliness in missing data imputation.

关 键 词:公路运输 缺失数据插补 随机森林 公路隧道 运营管理 

分 类 号:U491[交通运输工程—交通运输规划与管理]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象