In this tutorial, we show you how to compute counterfactual explanations for explaining positively-predicted instances. We use movie viewing data (Movielens1m) where the goal is to predict gender ('Female' user). The counterfactual explanation shows a set of movies such that when removing them from the user's viewing history, the predicted class changes from 'Female' to 'Male'.
Import libraries and import data set.
import pandas as pd
import numpy as np
import sedc_algorithm
from function_edc import fn_1
import scipy
%run sedc_algorithm.py #run sedc_algorithm.py module
For this demonstration, we use the Movielens 1M data set, which contains movie viewing behavior of users. The target variable is binary (taking value 1 if gender = 'FEMALE' and 0 if gender = 'MALE').
target = pd.read_csv('target_ML1M.csv')
target = 1 - target
data = pd.read_csv('data_ML1M.csv')
feature_names = pd.read_csv('feature_names_ML1M.csv')
Split data into a training and test set (80-20%). We use the finetuned MLP hyperparameter configuration as found in the paper of De Cnudde et al. (2018) titled 'An exploratory study towards applying and demystifying deep learning classification on behavioral big data'. We train the MLP classifier on the training data set.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(scipy.sparse.csr_matrix(data.iloc[:,1:3707].values), target.iloc[:,1], test_size=0.2, random_state=0)
from sklearn.neural_network import MLPClassifier
MLP_model = MLPClassifier(activation='relu', learning_rate_init=0.30452, alpha=0.0001, learning_rate='adaptive', early_stopping=True, hidden_layer_sizes=(532,135,1009), solver='lbfgs', batch_size=100)
MLP_model.fit(x_train, y_train)
Calculate the Area under the ROC curve (AUC) of the model on the test set.
from sklearn.metrics import roc_auc_score
Scores = MLP_model.predict_proba(x_test)[:,1] #predict scores using the trained MLP model
AUC = roc_auc_score(y_test,Scores) #output AUC of the model
print("The AUC of the model is %f" %AUC)
Predict 25% of the test instances as positive (gender = 'FEMALE') (e.g., because of a limited target budget). Obtain the indices of the test instances that are predicted as 'FEMALE', i.e. the instances that the model is most sure of that they are 'FEMALE' users.
probs = MLP_model.predict_proba(x_test)[:,1]
threshold_classifier_probs = np.percentile(probs,75)
predictions_probs = (probs>=threshold_classifier_probs)
indices_probs_pos = np.nonzero(predictions_probs)#indices of the test instances that are positively-predicted
probs[4] >= threshold_classifier_probs
classification_model = MLP_model
def classifier_fn(X):
c=classification_model.predict_proba(X)
y_predicted_proba=c[:,1]
return y_predicted_proba
Create an SEDC explainer object. By default, the SEDC algorithm stops looking for explanations when a first explanation is found or when a 5-minute time limit is exceeded or when more than 50 iterations are required (see edc_agnostic.py for more details). Only the active (nonzero) features are perturbed (set to zero) to evaluate the impact on the model's predicted output. In other words, only the movies that a user has watched can become part of the counterfactual explanation of the model prediction.
explainer_SEDC = SEDC_Explainer(feature_names = np.array(feature_names.iloc[:,1]),
threshold_classifier = threshold_classifier_probs,
classifier_fn = classifier_fn)
Show indices of positively-predicted test instances.
indices_probs_pos #all instances that are predicted as 'FEMALE'
Explain why the user with index = 17 is predicted as a 'FEMALE' user by the model.
index = 17
instance_idx = x_test[index]
explanation = explainer_SEDC.explanation(instance_idx)
Show explanation(s) that is/are found.
explanation[0]
print("IF the user did not watch the movie(s) " + str(explanation[0][0]) + ", THEN the predicted class would change from 'FEMALE' to 'MALE'.")
Explain why the user with index = 13 is predicted as a 'FEMALE' user by the model.
index = 13
instance_idx = x_test[index]
explanation = explainer_SEDC.explanation(instance_idx)
print("IF the user did not watch the movie(s) " + str(explanation[0][0]) + ", THEN the predicted class would change from 'FEMALE' to 'MALE'.")
Show more information about the explanation(s): explanation[0] shows the explanation set(s), explanation[1] shows the number of active features of the instance to explain, explanation[2] shows the number of explanations found, explanation[3] shows the number of features in the smallest-sized explanation, explanation[4] shows the time elapsed in seconds to find the explanation, explanation[5] shows the predicted score change when removing the feature(s) in the smallest-sized explanation, explanation[6] shows the number of iterations that the algorithm needed.
explanation
Show the 10 first explanation(s) found by the SEDC algorithm to explain the user index = 13. We change max_explained to 10.
explainer_SEDC2 = SEDC_Explainer(feature_names = np.array(feature_names.iloc[:,1]),
threshold_classifier = threshold_classifier_probs,
classifier_fn = classifier_fn, max_explained = 10)
index = 13
instance_idx = x_test[index]
explanation = explainer_SEDC2.explanation(instance_idx)
There are 10 explanations found after 1 iteration. The time elapsed about 2 seconds. The number of active features (movies watched) is 173 movies.
explanation