Quick-MIMIC:A Multimodal Data Extraction Pipeline for MIMIC with Parallelization  

在线阅读下载全文

作  者:Yutao Dou Wei Li Yangtao Zheng Xiaojun Yao Huanxiang Liu Albert Y.Zomaya Shaoliang Peng 

机构地区:[1]College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082,China,and also with Centre for Distributed and High Performance Computing,School of Computer Science,The University of Sydney,Darlington,NSW 2008,Australia [2]College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082,China [3]Centre for Distributed and High Performance Computing,School of Computer Science,The University of Sydney,Darlington,NSW 2008,Australia [4]Faculty of Applied Sciences,Macao Polytechnic University,Macao 999078,China

出  处:《Big Data Mining and Analytics》2024年第4期1333-1346,共14页大数据挖掘与分析(英文)

基  金:supported by the National Natural Science Foundation of China-Science and Technology Development Fund(No.62361166662);the National Key R&D Program of China(Nos.2023YFC3503400 and 2022YFC3400400);the Key R&D Program of Hunan Province(Nos.2023GK2004,2023SK2059,and 2023SK2060);the Top 10 Technical Key Project in Hunan Province(No.2023GK1010);the Key Technologies R&D Program of Guangdong Province(No.2023B1111030004);the Funds of State Key Laboratory of Chemo/Biosensing and Chemometrics,the National Supercomputing Center in Changsha(http://nscc.hnu.edu.cn/),and Peng Cheng Lab.

摘  要:Medical big data with artificial intelligence are vital in advancing digital medicine.However,the opaque and non-standardised nature embedded in most medical data extraction is prone to batch effects and has become a significant obstacle to reproducing previous works.This paper aims to develop an easy-to-use time-series multimodal data extraction pipeline,Quick-MIMIC,for standardised data extraction from MIMIC datasets.Our method can fully integrate different data structures into a time-series table,including structured,semi-structured,and unstructured data.We also introduce two additional modules to Quick-MIMIC,a pipeline parallelization method and data analysis methods,for reducing the data extraction time and presenting the characteristics of the extracted data intuitively.The extensive experimental results show that our pipeline can efficiently extract the needed data from the MIMIC dataset and convert it into the correct format for further analytic tasks.

关 键 词:MIMIC dataset data extraction pipeline data integration 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] TP391.41[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象