基于聚类分析法的织造车间能耗数据清洗  

Cleaning of Energy Consumption Data in Weaving Workshop Based on Clustering Analysis Method

在线阅读下载全文

作  者:黄启航 汝欣 戴宁[1] 俞博 陈炜 徐郁山 HUANG Qihang;RU Xin;DAI Ning;YU Bo;CHEN Wei;XU Yushan(School of Mechanical Engineering,Zhejiang Sci-Tech University,Hangzhou 310018,China;Zhejiang Tianheng Information Technology Co.,Ltd.,Shaoxing 312500,China;Zhejiang Kangli Automatic Control Technology Co.,Ltd.,Shaoxing 312500,China)

机构地区:[1]浙江理工大学机械工程学院,浙江杭州310018 [2]浙江天衡信息技术有限公司,浙江绍兴312500 [3]浙江康立自控科技有限公司,浙江绍兴312500

出  处:《软件工程》2024年第7期22-27,共6页Software Engineering

基  金:浙江省科技计划项目(2022C01202)。

摘  要:针对织造车间数据采集过程中存在的数据质量低、数据冗余高的问题,提出了一种基于聚类分析法的综合数据清洗方法。首先,对纺织企业车间能耗进行层级分析,针对异常数据提出了基于二分K-means算法的异常数据识别方法。其次,针对缺失数据,采用多样化数据插补办法,实现对不同特征数据的插补;针对数据冗余高的问题,引入可决系数对数据集进行去重,降低数据集冗余。最后,以某纺织企业车间运行数据为对象进行仿真实验,结果表明,经降重后,数据集的数据量降低了83%,数据集预测实验的平均绝对百分比误差波动范围小于2%,该方法在降低数据冗余的同时保证了预测的可靠性。In view of the problems of low data quality and high data redundancy in the data collection process of the weaving workshop,this paper proposes a comprehensive data cleaning method based on clustering analysis method.Firstly,hierarchical analysis is conducted on the energy consumption of textile enterprises,and a method for identifying abnormal data based on the binary K-means algorithm is proposed for abnormal data.Secondly,for missing data,diversified data interpolation methods are used to impute different feature data;for the problem of high data redundancy,the determination coefficient is introduced to deduplicate the dataset and reduce dataset redundancy.Finally,simulation experiments are conducted on the operating data of a textile enterprise workshop.The results show that after the reduction,the data volume of the dataset is reduced by 83%,and the average absolute percentage error range of the dataset prediction experiment is less than 2%.This method ensures the reliability of prediction while reducing data redundancy.

关 键 词:数据清洗 聚类 异常检测 去重 

分 类 号:TP111.8[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象