CreditScore_ceml

Counterfactual explanations for interpreting credit evaluations

Scenario

To increase productivity, a bank adopts a ML-Software to evaluate the credibility of new loan applications.

The bank opts for considering the following data of an applicant for a credit decision:

  • Credit term (months)
  • Credit volume (€)
  • Installment rate (% of disposable income)
  • Present residence since (years)
  • Age of applicant
  • Number of existing credits applicant already holds at this bank

The adopted ML-system is trained on this information taken from previous credit approval processes:

In [1]:
 #!/usr/bin/env python3
# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd
import warnings
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from sklearn.tree import DecisionTreeClassifier

from ceml.sklearn import generate_counterfactual

data_path = ""

# suppress warnings for clean output
warnings.simplefilter('ignore')

# Load data
data = np.load(data_path + "data.npz")

# extract details:
## X = case details 
## y = respective credit decisions
## features_desc = feature descriptions
X, y, features_desc = data["X"], data["y"], data["features_desc"]

# only select relevant data from all cases 
# (i.e., Credit term and volume, installment rate, etc.)
feat=[0, 1, 2, 3, 4, 5]
totnumfeat = len(feat)
X = X[:, feat]

# Create train-test data set
X, y = shuffle(X, y, random_state=4242)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=4242)

# Fit a classifier
model = DecisionTreeClassifier(max_depth=5)
model.fit(X_train, y_train)
Out[1]:
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=5,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best')

For future credit approval processes, this model will be consulted to decide about success of an application.

A new application: Sören Sorglos

The bank's customer Sören Sorglos would like to take out a loan to buy a new car.

He would like to apply for a credit of

  • 8858 €
  • with a running time of 48 months
  • with an installment rate of 2% of Mr. Sorglos' monthly disposable income

At his bank appointment, the bank teller checks his request as well as his personal details relevant for the loan decision:

In [2]:
# extract case Sören Sorglos
soeren_sorglos = X_test[29]

# print respective data
print("Customer data:")
f = list(features_desc[feat])
for j in range(len(f)):
    print(" {0}: {1}".format(f[j], soeren_sorglos[j]))
Customer data:
 Duration_in_month: 48
 Credit_amount: 8858
 Installment_rate_in_percentage_of_disposable_income: 2
 Present_residence_since: 1
 Age_in_years: 35
 Number_of_existing_credits_at_this_bank: 2

As we can see above, Mr. Sorglos is

  • 35 years old
  • lives in his current residence for 1 year
  • already has 2 existing credits at this bank

The bank teller uses the bank's internal ML-system to evaluate his credibility:

In [3]:
# use model to predict success of application:
print("Loan application rejected: {0}".format(bool(model.predict([soeren_sorglos]))))
Loan application rejected: True

The bank follows the ML-system's suggestion and declines the credit request.

Understanding the system's decision

Mr. Sorglos enquires about the reason why his credit application was rejected. Using counterfactual evaluation, the bank teller assesses how Mr. Sorglos's data needs to change in order for him to get the credit:

In [4]:
# target state is 0 (i.e., application is not rejected)
y_target = 0

# generate counterfactual explanation
soeren_sorglos_cf, y_cf, delta = generate_counterfactual(model, soeren_sorglos, y_target, return_as_dict=False)

# print result of counterfactual analysis
print("\nAdapted customer data:")
cf_change=delta.nonzero()
for j in cf_change[0]:
    print(" {0}: {1}".format(f[j], soeren_sorglos_cf[j]))

print("\nRenewed evaluation based on adapted data:")
print(" Loan application rejected: {0}".format(bool(model.predict([soeren_sorglos_cf]))))
Adapted customer data:
 Number_of_existing_credits_at_this_bank: 1

Renewed evaluation based on adapted data:
 Loan application rejected: False

Thus, the bank may inform Mr. Sorglos that he would have gotten a positive evaluation if he had only 1 instead of 2 running credits at the current bank.

References

Data used in this example: Statlog (German Credit Data) Data Set (https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)

Generation of counterfactual explanations courtesy of ceml toolbox developed and maintained by André Artelt: https://github.com/andreArtelt/ceml