On the Data Quality and Imbalance in Machine Learning-based Design and Manufacturing-A Systematic Review  

在线阅读下载全文

作  者:Jiarui Xie Lijun Sun Yaoyao Fiona Zhao 

机构地区:[1]Additive Design and Manufacturing Lab,Department of Mechanical Engineering,McGill University,Montreal,QC H3A 0G4,Canada [2]Smart Transportation Lab,Department of Civil Engineering,McGill University,Montreal,QC H3A 0G4,Canada

出  处:《Engineering》2025年第2期105-131,共27页工程(英文)

基  金:funded by the McGill University Graduate Excellence Fellowship Award(00157);the Mitacs Accelerate Program(IT13369);the McGill Engineering Doctoral Award(MEDA).

摘  要:Machine learning(ML)has recently enabled many modeling tasks in design,manufacturing,and condition monitoring due to its unparalleled learning ability using existing data.Data have become the limiting factor when implementing ML in industry.However,there is no systematic investigation on how data quality can be assessed and improved for ML-based design and manufacturing.The aim of this survey is to uncover the data challenges in this domain and review the techniques used to resolve them.To establish the background for the subsequent analysis,crucial data terminologies in ML-based modeling are reviewed and categorized into data acquisition,management,analysis,and utilization.Thereafter,the concepts and frameworks established to evaluate data quality and imbalance,including data quality assessment,data readiness,information quality,data biases,fairness,and diversity,are further investigated.The root causes and types of data challenges,including human factors,complex systems,complicated relationships,lack of data quality,data heterogeneity,data imbalance,and data scarcity,are identified and summarized.Methods to improve data quality and mitigate data imbalance and their applications in this domain are reviewed.This literature review focuses on two promising methods:data augmentation and active learning.The strengths,limitations,and applicability of the surveyed techniques are illustrated.The trends of data augmentation and active learning are discussed with respect to their applications,data types,and approaches.Based on this discussion,future directions for data quality improvement and data imbalance mitigation in this domain are identified.

关 键 词:Machine learning Design and manufacturing Data quality Data augmentation Active learning 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程] TP391[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象