In this tutorial, we show you how to compute counterfactual explanations for explaining positively-predicted instances. We use movie viewing data (Movielens1m) where the goal is to predict gender ('Female' user). The counterfactual explanation shows a set of movies such that when removing them from the user's viewing history, the predicted class changes from 'Female' to 'Male'.
Import libraries and import data set.
import pandas as pd
import numpy as np
import sedc_algorithm
from function_edc import fn_1
import scipy
from sklearn.metrics import roc_auc_score, accuracy_score, precision_recall_fscore_support, f1_score, confusion_matrix
%run sedc_algorithm.py #run sedc_algorithm.py module
For this demonstration, we use the Movielens 1M data set, which contains movie viewing behavior of users. The target variable is binary (taking value 1 if gender = 'FEMALE' and 0 if gender = 'MALE').
target = pd.read_csv('target_ML1M.csv')
target = 1-target
data = pd.read_csv('data_ML1M.csv')
feature_names = pd.read_csv('feature_names_ML1M.csv')
Split data into a training and test set (80-20%). We use a L2-regularized Logistic Regression model. We train the LR classifier on the training data set.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(scipy.sparse.csr_matrix(data.iloc[:,1:3707].values), target.iloc[:,1], test_size=0.2, random_state=0)
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2, random_state=0)
from sklearn.linear_model import LogisticRegression
#Values of the regularization parameter C in L2-LR.
C = [10**(-3),10**(-2),10**(-1),10**(0),10**(1),10**(2)]
p = np.sum(y_train)/np.size(y_train)
print("The balance of target in training subset is %f." %p)
#There are 70% male users, 30% female users in the training data.
We finetune the regularization parameter using a hold-out validation data set. We finetune the model on validation accuracy.
accuracy_vals=[]
for c in C:
LR = LogisticRegression(penalty='l2', solver='sag', C = c) #L2-regularized Logistic Regression
LR.fit(x_train, y_train)
probs = LR.predict_proba(x_val)[:,1]
threshold_classifier_probs = np.percentile(probs,(100-(p*100)))
predictions_probs = (probs >= threshold_classifier_probs) #Explicit, discrete predictions for validation data instances
accuracy_val = accuracy_score(y_val, np.array(predictions_probs))
accuracy_vals.append(accuracy_val)
print("The finetuning process has ended...")
C_optimal_accuracy = C[np.argmax(accuracy_vals)]
LR_best = LogisticRegression(penalty='l2', solver='sag', C = C_optimal_accuracy)
LR_best.fit(x_train, y_train)
probs = LR_best.predict_proba(x_test)[:,1]
threshold_classifier_probs = np.percentile(probs,(100-(p*100)))
predictions_probs = (probs >= threshold_classifier_probs) #Explicit, discrete predictions for validation data instances
accuracy_test = accuracy_score(y_test, np.array(predictions_probs))
print("The accuracy of the model on the test data is %f" %accuracy_test)
indices_probs_pos = np.nonzero(predictions_probs)#indices of the test instances that are positively-predicted
classification_model = LR_best
def classifier_fn(X):
c=classification_model.predict_proba(X)
y_predicted_proba=c[:,1]
return y_predicted_proba
Create an SEDC explainer object. By default, the SEDC algorithm stops looking for explanations when a first explanation is found or when a 5-minute time limit is exceeded or when more than 50 iterations are required (see edc_agnostic.py for more details). Only the active (nonzero) features are perturbed (set to zero) to evaluate the impact on the model's predicted output. In other words, only the movies that a user has watched can become part of the counterfactual explanation of the model prediction.
explainer_SEDC = SEDC_Explainer(feature_names = np.array(feature_names.iloc[:,1]),
threshold_classifier = threshold_classifier_probs,
classifier_fn = classifier_fn)
Show indices of positively-predicted test instances.
indices_probs_pos #all instances that are predicted as 'FEMALE'
Explain why the user with index = 13 is predicted as a 'FEMALE' user by the model.
index = 13
instance_idx = x_test[index]
explanation = explainer_SEDC.explanation(instance_idx)
explanation[0]
print("IF the user did not watch the movie(s) " + str(explanation[0][0]) + ", THEN the predicted class would change from 'FEMALE' to 'MALE'.")
Explain why the user with index = 15 is predicted as a 'FEMALE' user by the model.
index = 15
instance_idx = x_test[index]
explanation = explainer_SEDC.explanation(instance_idx)
explanation[0]
print("IF the user did not watch the movie(s) " + str(explanation[0][0]) + ", THEN the predicted class would change from 'FEMALE' to 'MALE'.")
Show more information about the explanation(s): explanation[0] shows the explanation set(s), explanation[1] shows the number of active features of the instance to explain, explanation[2] shows the number of explanations found, explanation[3] shows the number of features in the smallest-sized explanation, explanation[4] shows the time elapsed in seconds to find the explanation, explanation[5] shows the predicted score change when removing the feature(s) in the smallest-sized explanation, explanation[6] shows the number of iterations that the algorithm needed.
explanation
Show the 10 first explanation(s) found by the SEDC algorithm to explain the user index = 13. We change max_explained to 10.
explainer_SEDC2 = SEDC_Explainer(feature_names = np.array(feature_names.iloc[:,1]),
threshold_classifier = threshold_classifier_probs,
classifier_fn = classifier_fn, max_explained = 10)
index = 45
instance_idx = x_test[index]
explanation = explainer_SEDC2.explanation(instance_idx)
There are 32 explanations found after 3 iterations. The time elapsed is less than a second. The number of active features (movies watched) is 122 movies.
explanation