xgboost feature importance

Extracting and plotting feature importance

Josiah Parry
2 min readDec 1, 2018

This post will go over extracting feature (variable) importance and creating a ggplot object for it. I will draw on the simplicity of Chris Albon’s post. For steps to do the following in Python, I recommend his post.

If you’ve ever created a decision tree, you’ve probably looked at measures of feature importance. In the above flashcard, impurity refers to how many times a feature was use and lead to a misclassification. Here, we’re looking at the importance of a feature, so how much it helped in the classification or prediction of an outcome.

This example will draw on the build in data Sonar from the mlbench package.

Prepping the Environment

library(caret) 
library(xgboost)
library(tidyverse)

Loading the data

data("Sonar", package = "mlbench")

Train the decision tree

xgb_fit <- train(Class ~ ., 
data = Sonar,
method = "xgbLinear")

Extract feature importance

Since we are using the caret package we can use the built in function to extract feature importance, or the function from the xgboost package. We will do both.

caret feature importance

caret_imp <- varImp(xgb_fit)
caret_imp

xgboost feature importance

xgb_imp <- xgb.importance(feature_names = xgb_fit$finalModel$feature_names,
model = xgb_fit$finalModel)
head(xgb_imp)

Plotting feature importance

caret

You have a few options when it comes to plotting feature importance. You can call plot on the saved object from caret as follows:

plot(caret_imp)
ggplot(caret_imp) + theme_minimal()

xgboost

You can use the plot functionality from xgboost

xgb.plot.importance(xgb_imp)

Or use their ggplot feature

xgb.ggplot.importance(xgb_imp)

Originally published at http://josiahparry.com/post/xgb-feature-importance/ on December 1, 2018.

--

--