Feature Importance in Decision Trees

This page provides study materials on decision trees and explainable AI. It introduces the intuition behind how decision trees work and explains how to build them using impurity-based importance calculations. The content also covers feature importance and discusses how these concepts relate to explainable AI.

(Re-)Watch the Lecture


Download the Lecture Slides

đź“„ Please click the image to download/view the slides (PDF).

Slides Preview


Listen to the Podcast

Using the lecture notes, I made an AI-generated podcast with NotebookLM.


Play with the Code

The following code shows you how to use sklearn’s decision trees. If you want to practice how to implement a decision tree, please check out the coding exercise I prepared in this GitHub Repo.

import pandas as pd             # to have nice data frames
from sklearn import tree        # for the decision tree
import matplotlib.pyplot as plt # for plotting
import numpy as np

data = pd.read_csv("dataset_tml.csv",index_col=0)

# Map categorical features and label
data = data.replace({"Yes": 1, "No": 0}) # yes and no strings are mapped to 1 and 0

# Split into features and label
X = data.iloc[:, :-1]  # all columns except "Passed"
Y = data.iloc[:, -1]   # the "Passed" column

clf_gini = clf_gini.fit(X, Y)
clf_gini.predict(X)

# Look into the feature importance
impotances_gini = clf_gini.feature_importances_
print(impotances_gini) # --> array([0.4, 0.6])

# Plot the entire tree
from sklearn import tree
tree.plot_tree(clf_gini)

Final Decision Tree

Find Additional Study Materials

  1. Full Stanford Lecture on Decision Trees
  2. Small Lecture Video on Gini Impurity
  3. Book: Pattern Recognition and Machine Learning, Chapter 14.4
  4. Book: Interpretable Machine Learning, Chapter 9