基于K-medoids聚类的异构环境多源烟草物流数据集成方法研究  

Data Integration Method for Multi-source Tobacco Logistics in Heterogeneous Environments Based on K-medoids Clustering

在线阅读下载全文

作  者:郭光根 何蕊[1] 张玉军 GUO Guanggen;HE Rui;ZHANG Yujun

机构地区:[1]红云红河烟草(集团)有限责任公司昆明卷烟厂,昆明650000

出  处:《科技创新与应用》2024年第35期39-43,共5页Technology Innovation and Application

摘  要:由于烟草物流行业在运营过程中涉及的数据来源极其广泛且多样,数据不仅格式各异、结构复杂,而且往往分散存储在不同的信息系统中,导致物流数据在集成的过程中,出现数据吞吐量较低的现象。针对上述现象,提出基于K-medoids聚类的异构环境多源烟草物流数据集成方法。通过欠采样平衡类别分布,利用数据相关性和阈值清洗剔除冗余信息,提高异构环境多源烟草物流数据质量,设计基于K-medoids聚类的烟草物流数据集成框架,使用迁移学习动态调整源域权重以优化目标域聚类性能,引入带有相似性约束的新数据点作为初始聚类中心,实现异构环境多源烟草物流数据的有效集成。实验结果表明,设计方法通过聚类算法能够将来自不同数据源的数据进行有效分组和整合,降低数据处理的复杂性,提高数据集成的吞吐量。Due to the extremely wide and diverse data sources involved in the operation process of the tobacco logistics industry,the data not only has different formats and complex structures,but is also often scattered and stored in different information systems,resulting in data throughput during the integration process of logistics data Low phenomenon.Aiming at the above phenomena,a multi-source tobacco logistics data integration method based on K-medoids clustering in heterogeneous environments is proposed.By undersampling to balance category distribution,using data correlation and threshold cleaning to eliminate redundant information,we improve the quality of multi-source tobacco logistics data in heterogeneous environments.A tobacco logistics data integration framework based on K-medoids clustering is designed,and transfer learning is used to dynamically adjust source domain weights to optimize target domain clustering performance.New data points with similarity constraints are introduced as the initial clustering center to achieve effective integration of multi-source tobacco logistics data in heterogeneous environments.Experimental results show that the design method can effectively group and integrate data from different data sources through clustering algorithm,reducing the complexity of data processing and improving the throughput of data integration.

关 键 词:K-medoids聚类 异构环境 多源数据 烟草物流数据 数据集成方法 

分 类 号:TP311.1[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象