检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李德伦 肖志祥 谢宁新[3] 龚荣 LI Delun;XIAO Zhixiang;XIE Ningxin;GONG Rong(School of Electronic Information,Guangxi Minzu University,Nanning 530000,China;Guangxi institute of Meteorological Sciencess,Nanning 530022,China;School of Artificial Intelligence,Guangxi Minzu University,Nanning 530000,China)
机构地区:[1]广西民族大学电子信息学院,广西南宁530000 [2]广西壮族自治区气象科学研究所,广西南宁530022 [3]广西民族大学人工智能学院,广西南宁530000
出 处:《成都信息工程大学学报》2023年第5期602-609,共8页Journal of Chengdu University of Information Technology
基 金:国家自然科学基金资助项目(41905077);广西重点研发资助项目(桂科AB21196041);广西气象局科研计划资助项目(桂气科2021ZL05)。
摘 要:针对机器学习中单一特征选择方法性能不优良,结果稳定性差的问题,提出Spearman相关系数和XGBoost特征重要性混合的特征选择方法(SpearmanXgb),并结合RF、XGBoost和LightGBM 3种机器学习算法对ECMWF模式预报的广西春夏近地面2 m气温进行订正。结果表明:(1)混合特征选择方法在训练时间和均方根误差两方面,均优于单一的Spearman相关系数和XGBoost特征重要性特征选择方法,即训练时间减少19.7%和10.3%,均方根误差下降0.94%和0.64%。(2)3种模型预测的气温平均均方根误差相比模式分别下降了7.04%、7.47%和7.37%;预报前期(24~96 h)XGBoost的预报效果较好,预报中后期(120~240 h)LightGBM的预报效果较好。(3)由于广西东南部和东北部地形以山地、丘陵为主,地形较复杂,且易受台风、华南前汛期等复杂天气过程影响,气温变化幅度较大,ECMWF模式和3种机器学习模型对这两个地区的预报误差都较高。(4)利用SHAP值分析模型结果对各特征取值幅度的敏感程度,检验表明更准确的入选特征可不同程度降低模型的RMSE,为改善ECMWF模式预报效果提供了思路。Aiming at the poor performance and unstable result of single feature selection method in machine learning feature selections,a hybrid feature selection method(SpearmanXgb)combined with Spearman correlation coefficient and XGBoost feature importance is proposed.Then three machine learning algorithms(i.e.RF,XGBoost and LightGBM)are selected to correct the near-surface 2 m air temperature in spring and summer of Guangxi predicted by the ECMWF model.Results show that:(1)The hybrid feature selection method outperforms the single feature selection method in terms of training time and root mean square error(RMSE),i.e.,the training time is reduced by 19.7%and 10.3%,and the RMSE is decreased by 0.94%and 0.64%,respectively.(2)Compared with the ECMWF model,the average RMSE of the three models decreases by 7.04%,7.47%and 7.37%,respectively.XGBoost performs better in the early forecast hours(24-96 h),while LightGBM does well in the middle and late hours(120-240 h).(3)Due to both the southeastern and northeastern Guangxi are complex underlying surface with mountainous and hilly,and easily suffer fromcomplex weather processes such as typhoons and the first rainy season in South China,inducing vigorous daily variation of surface temperature over these two regions.Therefore,errors of the ECMWF model and three machine learning models are high.(4)Sensitivity of model results to values of each feature is examined by using the SHAP value.And the RMSE can be reduced to some extent by further tests with more accuracy on incoming features,which provides an idea for improving the forecast effect of the ECMWF model.
关 键 词:大气科学 温度预报 机器学习 混合特征选择 2 m气温订正
分 类 号:P457.3[天文地球—大气科学及气象学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15