Feature Importance in Decision Trees
This page provides study materials on decision trees and explainable AI. It introduces the intuition behind how decision trees work and explains how to build them using impurity-based importance calculations. The content also covers feature importance and discusses how these concepts relate to explainable AI.
(Re-)Watch the Lecture
Download the Lecture Slides
đź“„ Please click the image to download/view the slides (PDF).
Listen to the Podcast
Using the lecture notes, I made an AI-generated podcast with NotebookLM.
Play with the Code
The following code shows you how to use sklearn’s decision trees. If you want to practice how to implement a decision tree, please check out the coding exercise I prepared in this GitHub Repo.
import pandas as pd # to have nice data frames
from sklearn import tree # for the decision tree
import matplotlib.pyplot as plt # for plotting
import numpy as np
data = pd.read_csv("dataset_tml.csv",index_col=0)
# Map categorical features and label
data = data.replace({"Yes": 1, "No": 0}) # yes and no strings are mapped to 1 and 0
# Split into features and label
X = data.iloc[:, :-1] # all columns except "Passed"
Y = data.iloc[:, -1] # the "Passed" column
clf_gini = clf_gini.fit(X, Y)
clf_gini.predict(X)
# Look into the feature importance
impotances_gini = clf_gini.feature_importances_
print(impotances_gini) # --> array([0.4, 0.6])
# Plot the entire tree
from sklearn import tree
tree.plot_tree(clf_gini)
Find Additional Study Materials
- Full Stanford Lecture on Decision Trees
- Small Lecture Video on Gini Impurity
- Book: Pattern Recognition and Machine Learning, Chapter 14.4
- Book: Interpretable Machine Learning, Chapter 9