xgboost feature importance

Extracting and plotting feature importance

2 min readDec 1, 2018

This post will go over extracting feature (variable) importance and creating a ggplot object for it. I will draw on the simplicity of Chris Albon’s post. For steps to do the following in Python, I recommend his post.

If you’ve ever created a decision tree, you’ve probably looked at measures of feature importance. In the above flashcard, impurity refers to how many times a feature was use and lead to a misclassification. Here, we’re looking at the importance of a feature, so how much it helped in the classification or prediction of an outcome.

This example will draw on the build in data Sonar from the mlbench package.

Prepping the Environment

library(caret) 
library(xgboost) 
library(tidyverse)

Loading the data

data("Sonar", package = "mlbench")

Train the decision tree

xgb_fit <- train(Class ~ ., 
                 data = Sonar, 
                 method = "xgbLinear")

Extract feature importance

Since we are using the caret package we can use the built in function to extract feature importance, or the function from the xgboost package. We will do both.

`caret` feature importance

caret_imp <- varImp(xgb_fit)
caret_imp