检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:宋巍 张贵庆 谢京容 董明媚[2] 岳心阳[2] 杨扬[2] SONG Wei;ZHANG Guiqing;XIE Jingrong;DONG Mingmei;YUE Xinyang;YANG Yang(School of Information,Shanghai Ocean University,Shanghai 201306,China;National Marine Information Center,Tianjin 300171,China)
机构地区:[1]上海海洋大学信息学院,上海201306 [2]国家海洋信息中心,天津300171
出 处:《海洋预报》2024年第3期61-70,共10页Marine Forecasts
基 金:国家重点研发计划项目(2021YFC3101601);上海市科委部分地方高校能力建设项目(20050501900)。
摘 要:提出一种多模型组合的两层海洋数据质量控制框架,选择了多种常见分类算法作为基学习器对数据质量标签进行初级预测,再经过投票法或堆叠(Stacking)法确定海洋数据质量的标识符;针对类别不平衡问题,结合自适应下采样策略,降低数据的不平衡比率,并结合Focal Loss损失函数,提升模型对难分类样本的识别能力。以来源于国际综合海洋大气数据集的海表温度和气温数据为例进行质量控制验证,结果表明:投票法或堆叠法对极少类的错误样本分类的F1 score(精确率和召回率的加权调和平均值)在海表温度数据上可达到0.980 6和0.981 2,在气温数据上可达到0.998 5和0.998 3。This paper proposes a two-layer framework for ocean data quality control based on the combination of multiple models.Various common classification algorithms are chosen as base learners to predict the primary quality labels of ocean data,and a Voting or Stacking strategy is used to identify the quality of the data.To address the issue of class imbalance,an adaptive undersampling strategy is combined with the Focal loss function to enhance the model's ability to recognize difficult samples.To verify the performance of the proposed method,we apply it to the quality control of sea surface temperature and air temperature data that are from ICOADS(International Comprehensive Ocean-Atmosphere Data Set).The results show that the F1 score(the weighted harmonic mean of precision and recall)of rare anomaly samples by the Voting or Stacking methods can reach 0.9806 and 0.9812 for sea surface temperature data,and 0.9985 and 0.9983 for air temperature data.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.117.249.37