xgboost feature importance
Extracting and plotting feature importance
This post will go over extracting feature (variable) importance and creating a ggplot object for it. I will draw on the simplicity of Chris Albon’s post. For steps to do the following in Python, I recommend his post.
If you’ve ever created a decision tree, you’ve probably looked at measures of feature importance. In the above flashcard, impurity refers to how many times a feature was use and lead to a misclassification. Here, we’re looking at the importance of a feature, so how much it helped in the classification or prediction of an outcome.
This example will draw on the build in data Sonar
from the mlbench
package.
Prepping the Environment
library(caret)
library(xgboost)
library(tidyverse)
Loading the data
data("Sonar", package = "mlbench")
Train the decision tree
xgb_fit <- train(Class ~ .,
data = Sonar,
method = "xgbLinear")
Extract feature importance
Since we are using the caret
package we can use the built in function to extract feature importance, or the function from the xgboost
package. We will do both.
caret
feature importance
caret_imp <- varImp(xgb_fit)
caret_imp
xgboost
feature importance
xgb_imp <- xgb.importance(feature_names = xgb_fit$finalModel$feature_names,
model = xgb_fit$finalModel)head(xgb_imp)
Plotting feature importance
caret
You have a few options when it comes to plotting feature importance. You can call plot on the saved object from caret as follows:
plot(caret_imp)
ggplot(caret_imp) + theme_minimal()
xgboost
You can use the plot functionality from xgboost
xgb.plot.importance(xgb_imp)
Or use their ggplot feature
xgb.ggplot.importance(xgb_imp)
Originally published at http://josiahparry.com/post/xgb-feature-importance/ on December 1, 2018.