机器学习中混合特征选择对模式预报广西春夏气温的订正研究  被引量:1

A Study on the Adjusting Spring and Summer Surface Air Temperature of ECMWF Model by a Hybrid Feature Selection Method in Machine Learning of Guangxi

在线阅读下载全文

作  者:李德伦 肖志祥 谢宁新[3] 龚荣 LI Delun;XIAO Zhixiang;XIE Ningxin;GONG Rong(School of Electronic Information,Guangxi Minzu University,Nanning 530000,China;Guangxi institute of Meteorological Sciencess,Nanning 530022,China;School of Artificial Intelligence,Guangxi Minzu University,Nanning 530000,China)

机构地区:[1]广西民族大学电子信息学院,广西南宁530000 [2]广西壮族自治区气象科学研究所,广西南宁530022 [3]广西民族大学人工智能学院,广西南宁530000

出  处:《成都信息工程大学学报》2023年第5期602-609,共8页Journal of Chengdu University of Information Technology

基  金:国家自然科学基金资助项目(41905077);广西重点研发资助项目(桂科AB21196041);广西气象局科研计划资助项目(桂气科2021ZL05)。

摘  要:针对机器学习中单一特征选择方法性能不优良,结果稳定性差的问题,提出Spearman相关系数和XGBoost特征重要性混合的特征选择方法(SpearmanXgb),并结合RF、XGBoost和LightGBM 3种机器学习算法对ECMWF模式预报的广西春夏近地面2 m气温进行订正。结果表明:(1)混合特征选择方法在训练时间和均方根误差两方面,均优于单一的Spearman相关系数和XGBoost特征重要性特征选择方法,即训练时间减少19.7%和10.3%,均方根误差下降0.94%和0.64%。(2)3种模型预测的气温平均均方根误差相比模式分别下降了7.04%、7.47%和7.37%;预报前期(24~96 h)XGBoost的预报效果较好,预报中后期(120~240 h)LightGBM的预报效果较好。(3)由于广西东南部和东北部地形以山地、丘陵为主,地形较复杂,且易受台风、华南前汛期等复杂天气过程影响,气温变化幅度较大,ECMWF模式和3种机器学习模型对这两个地区的预报误差都较高。(4)利用SHAP值分析模型结果对各特征取值幅度的敏感程度,检验表明更准确的入选特征可不同程度降低模型的RMSE,为改善ECMWF模式预报效果提供了思路。Aiming at the poor performance and unstable result of single feature selection method in machine learning feature selections,a hybrid feature selection method(SpearmanXgb)combined with Spearman correlation coefficient and XGBoost feature importance is proposed.Then three machine learning algorithms(i.e.RF,XGBoost and LightGBM)are selected to correct the near-surface 2 m air temperature in spring and summer of Guangxi predicted by the ECMWF model.Results show that:(1)The hybrid feature selection method outperforms the single feature selection method in terms of training time and root mean square error(RMSE),i.e.,the training time is reduced by 19.7%and 10.3%,and the RMSE is decreased by 0.94%and 0.64%,respectively.(2)Compared with the ECMWF model,the average RMSE of the three models decreases by 7.04%,7.47%and 7.37%,respectively.XGBoost performs better in the early forecast hours(24-96 h),while LightGBM does well in the middle and late hours(120-240 h).(3)Due to both the southeastern and northeastern Guangxi are complex underlying surface with mountainous and hilly,and easily suffer fromcomplex weather processes such as typhoons and the first rainy season in South China,inducing vigorous daily variation of surface temperature over these two regions.Therefore,errors of the ECMWF model and three machine learning models are high.(4)Sensitivity of model results to values of each feature is examined by using the SHAP value.And the RMSE can be reduced to some extent by further tests with more accuracy on incoming features,which provides an idea for improving the forecast effect of the ECMWF model.

关 键 词:大气科学 温度预报 机器学习 混合特征选择 2 m气温订正 

分 类 号:P457.3[天文地球—大气科学及气象学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象