小样本数据下基于K-Means聚类和集成学习的混凝投药预测  被引量:3

Research on coagulation dosing prediction based on K-Means clustering and ensemble learning under small sample data

在线阅读下载全文

作  者:王世杰[1] 李一鸣[1] 植殷 武仁超 王涛 程紫微 郑磊[1] 肖峰 WANG Shijie;LI Yiming;ZHI Yin;WU Renchao;WANG Tao;CHENG Ziwei;ZHENG Lei;XIAO Feng(College of Water Resources and Hydropower Engineering,North China Electric Power University,Beijing 102206,China;Beijing Global Water Technology Co.,Ltd.,Beijing 100085,China;Ningxia Great Wall Water Co.,Ltd,Yinchuan 750004,China)

机构地区:[1]华北电力大学水利与水电工程学院,北京102206 [2]北京环球中科水务科技股份有限公司,北京100085 [3]宁夏长城水务有限责任公司,银川750004

出  处:《环境工程学报》2024年第1期181-188,共8页Chinese Journal of Environmental Engineering

基  金:国家自然科学基金资助项目(52030003)。

摘  要:为了解决混凝投药预测过程中的小样本问题,提出基于K-Means聚类和集成学习的PAC投加量预测方法。首先,根据原水浊度和水温2个特征采用K-Means聚类将水质分为3类,利用分层抽样从3类水质数据中抽取训练集和测试集;其次,基于Bagging集成学习算法,构建由支持向量机、随机森林、Adaboost、GBDT、Catboost、XGBoost和LightGBM共7种学习器组成的PAC投加量集成预测模型(KM-Bagging);最后,以银川市某给水厂2021—2022年的运行数据为例进行验证。结果表明,KM-Bagging模型对小样本的PAC投加量具有较高预测精度,R^(2)超过0.8,MAPE小于5%。采用6个月和9个月的日监测数据预测PAC投加量,适合数据监测时间短、精度要求不高的情况,预测结果可为原水水质发生突变时的PAC投加量调整提供参考。采用1年的日监测数据预测PAC投加量,预测精度能够满足工程应用的要求,可为水厂实际PAC投加提供辅助指导。研究结果对小样本数据下的混凝药剂投加建模与预测具有参考价值。A PAC dosage prediction method was proposed to address small sample size issues in coagulant dosage prediction.The method was based on K-Means clustering and ensemble learning.Firstly,Water quality was divided into three categories using K-Means clustering based on raw water turbidity and water temperature.The training and test sets were then extracted from the data using stratified sampling.Secondly,a PAC dosage ensemble prediction model(KM-Bagging) was constructed based on the Bagging ensemble learning algorithm.The model consisted of seven learners:Support Vector Machine,Random Forest,Adaboost,Gradient Boosting Decision Tree,Catboost,XGBoost,and LightGBM.The method was validated using operational data from a water supply plant in Yinchuan City from 2021 to 2022.The results showed that the KM-Bagging model had high prediction accuracy for small sample sizes,with an R^(2) exceeding 0.8 and MAPE less than 5%.When 6-and9-month daily monitoring data were used to predict PAC dosing,the model was suitable for cases where monitoring time was short and high accuracy was not required.The predicted results can be used as a reference for adjusting the PAC dosage when there was a sudden change in raw water quality.When one year of daily monitoring data was used to predict PAC dosing,the prediction accuracy met the requirements for engineering applications and provided auxiliary guidance for actual PAC dosage in water treatment plants.The results of study can provide reference value for modeling coagulant dosage prediction with small sample data.

关 键 词:混凝投药量预测 小样本数据 Bagging集成学习 K-MEANS聚类 

分 类 号:TU991.22[建筑科学—市政工程] TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象