Classification random forest with exact conditioning for spatial prediction of categorical variables  被引量:1

在线阅读下载全文

作  者:Francky Fouedjio 

机构地区:[1]AngloGold Ashanti Australia Ltd.,Growth and Exploration,140 St.Georges Terrace,Perth,WA,6000,Australia

出  处:《Artificial Intelligence in Geosciences》2021年第1期82-95,共14页地学人工智能(英文)

摘  要:Machine learning methods are increasingly used for spatially predicting a categorical target variable when spatially exhaustive predictor variables are available within the study region.Even though these methods exhibit competitive spatial prediction performance,they do not exactly honor the categorical target variable's observed values at sampling locations by construction.On the other side,competitor geostatistical methods perfectly match the categorical target variable's observed values at sampling locations by essence.In many geoscience applications,it is often desirable to perfectly match the observed values of the categorical target variable at sampling locations,especially when the categorical target variable's measurements can be reasonably considered error-free.This paper addresses the problem of exact conditioning of machine learning methods for the spatial prediction of categorical variables.It introduces a classification random forest-based approach in which the categorical target variable is exactly conditioned to the data,thus having the exact conditioning property like competitor geostatistical methods.The proposed method extends a previous work dedicated to continuous target variables by using an implicit representation of the categorical target variable.The basic idea consists of transforming the ensemble of classification tree predictors'(categorical)resulting from the traditional classification random forest into an ensemble of signed distances(continuous)associated with each category of the categorical target variable.Then,an orthogonal representation of the ensemble of signed distances is created through the principal component analysis,thus allowing to reformulate the exact conditioning problem as a system of linear inequalities on principal component scores.Then,the sampling of new principal component scores ensuring the data's exact conditioning is performed via randomized quadratic programming.The resulting conditional signed distances are turned out into an ensemble of categorical output

关 键 词:Categorical variable CLASSIFICATION Exact conditioning Principal component analysis Signed distance Spatial prediction Quadratic programming 

分 类 号:O17[理学—数学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象