A visual analysis approach for data imputation via multi-party tabular data correlation strategies  

在线阅读下载全文

作  者:Haiyang ZHU Dongming HAN Jiacheng PAN Yating WEI Yingchaojie FENG Luoxuan WENG Ketian MAO Yuankai XING Jianshu LV Qiucheng WAN Wei CHEN 

机构地区:[1]The State Key Lab of CAD&CG,Zhejiang University,Hangzhou 310058,China [2]Wuchan Zhongda Digital Technology Co.,Ltd.,Hangzhou 310020,China [3]Zhejiang Metals and Materials Co.,Ltd.,Hangzhou 310005,China

出  处:《Frontiers of Information Technology & Electronic Engineering》2024年第3期398-414,共17页信息与电子工程前沿(英文版)

基  金:Project supported by the Key R&D"Pioneer"Tackling Plan Program of Zhejiang Province,China(No.2023C01119);the"Ten Thousand Talents Plan"Science and Technology Innovation Leading Talent Program of Zhejiang Province,China(No.2022R52044);the Major Standardization Pilot Projects for the Digital Economy(Digital Trade Sector)of Zhejiang Province,China(No.SJ-Bz/2023053);the National Natural Science Foundationof China(No.62132017)。

摘  要:Data imputation is an essential pre-processing task for data governance,aimed at filling in incomplete data.However,conventional data imputation methods can only partly alleviate data incompleteness using isolated tabular data,and they fail to achieve the best balance between accuracy and eficiency.In this paper,we present a novel visual analysis approach for data imputation.We develop a multi-party tabular data association strategy that uses intelligent algorithms to identify similar columns and establish column correlations across multiple tables.Then,we perform the initial imputation of incomplete data using correlated data entries from other tables.Additionally,we develop a visual analysis system to refine data imputation candidates.Our interactive system combines the multi-party data imputation approach with expert knowledge,allowing for a better understanding of the relational structure of the data.This significantly enhances the accuracy and eficiency of data imputation,thereby enhancing the quality of data governance and the intrinsic value of data assets.Experimental validation and user surveys demonstrate that this method supports users in verifying and judging the associated columns and similar rows using theirdomain knowledge.

关 键 词:Data governance Data incompleteness Data imputation Data visualization Interactive visual analysis 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象