Measuring variable importance is often a difficult task: among others models can be complex and covariates can interact with each other and can be correlated. This study focuses on two questions: First, what should be the theoretical measure of variable importance under a given d
...
Measuring variable importance is often a difficult task: among others models can be complex and covariates can interact with each other and can be correlated. This study focuses on two questions: First, what should be the theoretical measure of variable importance under a given data-generating model? And second, what are the best estimates of these theoretical measures? Two theoretical measures and some corresponding estimates are presented of which one is the well-known random forests variable importance measure (Breiman, 2001). A simulation study is done for both linear and nonlinear models to find out what are the best estimates of variable importance measures for given data-generating models. Most measures struggle when covariates are correlated, but make an improvement in performance when the number of split variables is tuned.