多保真度数据学习算法的定量噪声评价  

A Quantitative Noise Method to Evaluate Machine Learning Algorithm on Multi-Fidelity Data

在线阅读下载全文

作  者:刘晓彤 王滋明 欧阳嘉华 杨涛[1,2] LIU Xiaotong;WANG Ziming;OUYANG Jiahua;YANG Tao(Beijing Advanced Innovation Center for Materials Genome Engineering,Beijing Information Science and Technology University,Beijing 100101,China;School of Computer,Beijing Information Science and Technology University,Beijing 100101,China;School of Information Science and Technology,Jinan University,Guangzhou 511442,China)

机构地区:[1]北京信息科技大学北京市材料基因工程高精尖创新中心,北京100101 [2]北京信息科技大学计算机学院,北京100101 [3]暨南大学信息科学技术学院,广州511442

出  处:《硅酸盐学报》2023年第2期405-410,共6页Journal of The Chinese Ceramic Society

基  金:国家自然科学基金项目(22203008,22272009)。

摘  要:多保真度数据是当前材料领域数据的主要存在形式。在数据生产端,不同量化方法在材料同种属性的计算上存在较大差距。对于数据消费端的机器学习算法,研究人员为最大化提取数据中知识设计了各种方法。采用定量噪声添加的方法,评价不同噪声强度、类型对不同多保真度数据学习方法的影响,通过迭代降噪验证数据修正方法的适用场景。结果表明:多保真度数据的利用方式至关重要,需对各子数据集中数据量及含噪情况进行综合考量。在使用不同噪声类型与强度构造出的多种数据集上,得益于数据间的协同效应,逐步删除低保真度数据的"Onion"训练方式明显优于按数据集所含噪声减小方向逐个进行的训练方式。在多保真度数据训练中,无论何种噪声强度及训练方式,线性噪声对模型的影响更小。对于采样噪声来说,在各环节更好地模拟了真实多保真度数据,建议被后续研究采用。此外,复杂噪声难以让少量真值数据发挥"纠偏"作用,更适合进行迭代降噪处理。Most data in material science are multi-fidelity data.From the viewpoint of data producer,there is a system error for any quantum method.For machine learning algorithm,as a data consumer,various methods have been designed to maximize the number of knowledges extracted from the multi-fidelity data.In this paper,a quantitative method of noise addition was used to evaluate the influence of different noise types and intensities on some multi-fidelity data learning methods.And the effective scope of the data correction method was verified via iterative noise reduction.The results show that the ways to exploit the multi-fidelity data are crucial.It is necessary to consider comprehensively both the size and the noise level of the datasets.On a variety of datasets constructed with different noise types and intensities,the "Onion" training method that gradually deletes lower fidelity data is better than the "one by one" training method in the direction of noise reduction due to the synergistic effect of different multi-fidelity data.No matter what kind of noise intensity and training method,linear noise has less impact on the final performance of model.However,the data with sampled noise added,which the final testing results are similar to the real multi-fidelity data,were recommended to be adopted in a future research.Also,the complex noise in data is difficult to be corrected by a small amount of true data,thus being more suitable for the iterative noise reduction processing.

关 键 词:多保真度 属性预测 机器学习 定量噪声 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象