检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:贾雪 Alex Aziz Yusuke Hashimoto 李昊 Xue Jia;Alex Aziz;Yusuke Hashimoto;Hao Li(Advanced Institute for Materials Research(WPI-AIMR),Tohoku University,Sendai 980-8577,Japan;Tohoku Forum for Creativity,Tohoku University,Sendai 980-8577,Japan)
机构地区:[1]Advanced Institute for Materials Research(WPI-AIMR),Tohoku University,Sendai 980-8577,Japan [2]Tohoku Forum for Creativity,Tohoku University,Sendai 980-8577,Japan
出 处:《Science China Materials》2024年第4期1173-1182,共10页中国科学(材料科学)(英文版)
基 金:supported by the JSPS KAKENHI (JP23K13599);the Hirose Foundation。
摘 要:人工智能的发展正在改变材料科学领域.然而,大规模材料数据集中存在错误数据以及利用机器学习预测与温度相关的性质时出现过拟合等挑战.本文以热电材料为例,首先采取一系列合理的方法删除问题数据,从Starrydata2数据库中获得包括7295种成分在不同温度下的92,291个数据.然后,提出了一种基于成分的交叉验证方法避免过拟合.进而,使用梯度提升决策树方法构建了机器学习模型,并获得了显著的R2.最后,使用该模型对Materials Project数据库中的材料进行评估,Ge2Te5As2和Ge3(Te3As)2表现出较高的zT值.理论计算得到n型和p型Ge2Te5As2的最大zT值为1.98和2.12,n型和p型Ge3(Te3As)2的最大zT值为0.58和0.74,表明它们是有潜力的热电材料.本工作提出了一个处理和克服材料科学中的人工智能大数据挑战的示例.The development of artificial intelligence(AI),particularly,data science and machine learning(ML),is revolutionizing the field of material science.Yet,some inevitable key challenges remain,including errors contained in largescale material datasets and the overfitting of predicted temperature-dependent properties.In this work,using thermoelectric(TE)materials as an archetypal example,we firstly performed a series of rational actions to identify and discard questionable data,and obtained 92,291 data points consisting of 7295 compositions and different temperatures from the Starrydata2 database.Next,we proposed a composition-based cross-validation method to emphasize that the data points with the same compositions but different temperatures should not be split into different sets to avoid overfitting.Then,we built ML models using the gradient boosting decision tree(GBDT)method,and achieved remarkable R?values of~0.89,~0.90,and~0.89 on the training dataset,test dataset,and new out-of-sample experimental data published in 2023,verifying the model's high accuracy in predicting newly available materials.Using this ML model,we carried out a large-scale evaluation of the stable materials from the Materials Project database,and Ge,TesAs2 and Ges(TesAs)2 were predicted to exhibit high zT values.Density functional theory calculations were then executed and the calculated maximum zT values were 1.98 and 2.12 for n-and p-type Ge2TesAs2,and 0.58 and 0.74 for n-and p-type Ges(TesAs)2,respectively,indicating their potential as TE materials and supporting our ML model.This work presents an example of dealing with and overcoming big data challenges in AI for materials science.
关 键 词:人工智能 大数据 机器学习 过拟合 热电材料 材料数据 数据库 温度相关
分 类 号:TB34[一般工业技术—材料科学与工程] TP18[自动化与计算机技术—控制理论与控制工程] TP311.13[自动化与计算机技术—控制科学与工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.43