检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:仇明鑫 雷帅 柳先辉[1] 张颖瑶[1] QIU Mingxin;LEI Shuai;LIU Xianhui;ZHANG Yingyao(School of Electronics and Information Engineering,Tongji University,Shanghai 201804,China)
机构地区:[1]同济大学电子与信息工程学院,上海201804
出 处:《计算机科学》2024年第S02期527-533,共7页Computer Science
基 金:国家重点研发计划(2022YFB3305802)。
摘 要:资源循环利用产业的废旧产品回收过程中多系统协同工作会产生大量多源异构数据,针对废旧产品线上线下回收信息难以融合并有效利用的问题,提出了一种面向回收信息的线上线下多源异构数据融合系统。首先,系统采用Web API接口实现线上线下多源异构数据的数据接入,通过数据解析、数据清洗及数据转换等步骤完成对多源异构数据的预处理。其次,针对现有基于聚类分析的数据融合方法在融合过程中往往还需预先指定聚类簇数的问题,提出了一种基于多目标聚类的融合方法,以在融合过程中自动确定聚类簇数。通过对预处理后的数据进行特征选择、标签编码、数据转换和归一化处理,结合多目标聚类算法完成对部分典型数据的特征提取与聚类,并对全量及增量数据进行基于欧氏距离的数据匹配。最后,系统采用了基于MyCat中间件及MySQL主从复制的分布式数据库方案,以实现融合数据的存储与共享交换。测试表明,该数据融合系统可以实现对废旧产品线上线下多源异构回收信息的数据融合及共享交换,同时,相比基于K-Means的数据融合方法,所提出的基于多目标聚类的数据融合方法在不同数据集上都能够自动确定最优聚类簇数,并且能够获得不差于K-Means融合方法的簇内紧密性和簇间分离性。In the recycling process of waste products in the resource recycling industry,a large number of multi-source hetero-geneous data will be generated due to the collaborative work of multiple systems.Aiming at the problem that the online and offline recycling information of waste products is difficult to fuse and effectively use,an online and offline multi-source heteroge-neous data fusion system for recycling information is proposed.Firstly,the system uses the Web API to realize the data access of online and offline multi-source heterogeneous data,and completes the pretreatment of it through the steps of data parsing,data cleaning and data conversion.Secondly,aiming at the problem that the existing data fusion methods based on clustering analysis usually need to specify the number of clusters in advance in the fusion process,a fusion method based on multi-objective clustering is proposed,which aims to automatically determine the number of clusters in the fusion process.Through feature selection,label co-ding,data conversion and normalization of the preprocessed data,combined with the multi-objective clustering algorithm,feature extraction and clustering of typical data is completed,and data matching based on Euclidean distance is performed for the total and incremental data.Finally,the system uses a distributed database scheme based on MyCat middleware and MySQL master-slave replication to realize the storage,sharing and exchange of fusion data.The test shows that the data fusion system can rea-lize the data fusion,sharing and exchange of online and offline multi-source heterogeneous recycling information of waste pro-ducts.At the same time,compared to the method based on K-Means,the proposed data fusion method based on multi-objective clustering can automatically determine the optimal cluster number on different data sets,and can obtain the compactness and separation no worse than that of the K-Means fusion method.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.109