In this tutorial, we show you how to compute counterfactual explanations for explaining positively-predicted instances. We use textual data (20newsgroups) where the goal is to predict whether a document is about a 'Medical' topic. The counterfactual explanation shows a set of words such that, when removing them from the document, the predicted topic is not longer 'Medical'.
Import libraries and import data set.
import pandas as pd
import numpy as np
import sedc_algorithm
from function_edc import fn_1
%run sedc_algorithm.py #run sedc_algorithm.py module
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import roc_auc_score, accuracy_score
from sklearn.model_selection import ParameterGrid
from sklearn.svm import SVC
import sklearn.feature_extraction
from sklearn.feature_extraction.text import TfidfVectorizer
For this tutorial, we will use the 20newsgroups data set. For simplicity, we will use a binary target variable: medical topic vs non-medical topic (sci.med).
from sklearn.datasets import fetch_20newsgroups
categories = ['alt.atheism',
'comp.graphics',
'comp.os.ms-windows.misc',
'comp.sys.ibm.pc.hardware',
'comp.sys.mac.hardware',
'comp.windows.x',
'misc.forsale',
'rec.autos',
'rec.motorcycles',
'rec.sport.baseball',
'rec.sport.hockey',
'sci.crypt',
'sci.electronics',
'sci.med',
'sci.space',
'soc.religion.christian',
'talk.politics.guns',
'talk.politics.mideast',
'talk.politics.misc',
'talk.religion.misc']
newsgroups = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'), categories=categories)
First, we preprocess the raw textual data into a structured data format that can be used for modelling. We lowercase all words in the documents, remove stopwords and lemmatize the textual data.
newsgroups_data = newsgroups.data
### Lowercase (normalization) ###
data_=[]
for story in newsgroups_data:
new=story.lower()
data_.append(new)
### Remove stopwords ###
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
stop_words = set(stopwords.words('english'))
newsgroups_dataset=[]
for story in data_:
words=word_tokenize(story)
text=""
for words in word_tokenize(story):
if not words in stop_words:
text+=(" "+words)
newsgroups_dataset.append(text)
newsgroups_dataset_2=[]
for story in data_:
words=word_tokenize(story)
text=""
for words in word_tokenize(story):
text+=(" "+words)
newsgroups_dataset_2.append(text)
# Import lemmatizer modules.
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
newsgroups_lemma=[]
for story in newsgroups_dataset:
words=word_tokenize(story)
text=""
for words in word_tokenize(story):
lemma_word=lemmatizer.lemmatize(words)
extra=" "+str(lemma_word)
text+=extra
newsgroups_lemma.append(text)
We create a vectorizer object to transform the preprocessed raw data (removed stop words, converted to lowercase, lemmatized) into a term frequency-inverse document frequency format (td-idf).
vectorizer = sklearn.feature_extraction.text.TfidfVectorizer(min_df=2)
Split data into a training and test set (80-20%).
# Seed for random_state = 0
indices=np.arange(18846)
from sklearn.model_selection import train_test_split
indices_train, indices_test = train_test_split(indices, test_size=0.2, random_state=0)
indices_train, indices_val = train_test_split(indices_train, test_size=0.25, random_state=0)
# Make data splits from preprocessed textual data #
newsgroups_lemma_train = list(newsgroups_lemma[i] for i in indices_train)
newsgroups_lemma_test = list(newsgroups_lemma[i] for i in indices_test)
newsgroups_lemma_val=list(newsgroups_lemma[i] for i in indices_val)
Fit the vectorizer on the training data. Transform the data of training, validation and test data using this vectorizer.
x_train = vectorizer.fit_transform(newsgroups_lemma_train)
x_test = vectorizer.transform(newsgroups_lemma_test)
x_val = vectorizer.transform(newsgroups_lemma_val)
Extract the target variable (1 refers to a medical topic, 0 to another topic).
Y = newsgroups.target
Y = np.reshape(Y,(np.size(Y),1))
# Topic: sci.med
y = Y.copy()
for i in range(len(Y)):
if (Y[i]==13):
y[i]=1
else: y[i]=0
y_train = y[indices_train]
y_test = y[indices_test]
y_val = y[indices_val]
We use a Support Vector Machine model with a linear kernel. We finetune the regularization parameter using a hold-out validation data set.
C = [10**(-3),10**(-2),10**(-1),10**(0),10**(1),10**(2)]
p = np.sum(y_train)/np.size(y_train)
print("The balance of target in training subset is %f." %p)
#There are about 5% documents having a 'Medical' topic in the training data.
accuracy_vals=[]
for c in C:
SVC_model = SVC(C = c, kernel="linear", probability=True)
SVC_model.fit(x_train, y_train)
probs = SVC_model.decision_function(x_val)
threshold_classifier_probs = np.percentile(probs,(100-(p*100)))
predictions_probs = (probs >= threshold_classifier_probs) #Explicit, discrete predictions for validation data instances
accuracy_val = accuracy_score(y_val, np.array(predictions_probs))
accuracy_vals.append(accuracy_val)
print("The finetuning process has ended...")
C_optimal_accuracy = C[np.argmax(accuracy_vals)]
SVC_best = SVC(C = C_optimal_accuracy, kernel="linear", probability=True)
SVC_best.fit(x_train, y_train)
probs = SVC_best.decision_function(x_test)
threshold_classifier_probs = np.percentile(probs,(100-(p*100)))
predictions_probs = (probs >= threshold_classifier_probs) #Explicit, discrete predictions for validation data instances
accuracy_test = accuracy_score(y_test, np.array(predictions_probs))
print("The accuracy of the model on the test data is %f" %accuracy_test)
indices_probs_pos = np.nonzero(predictions_probs)#Indices of the test documents that are positively-predicted
classification_model = SVC_best
feature_names = vectorizer.get_feature_names()
def classifier_fn(X):
c=classification_model.decision_function(X)
y_predicted_proba = c
return y_predicted_proba
Create an SEDC explainer object. By default, the SEDC algorithm stops looking for explanations when a first explanation is found or when a 5-minute time limit is exceeded or when more than 50 iterations are required (see edc_agnostic.py for more details). Only the active (nonzero) features are perturbed (set to zero) to evaluate the impact on the model's predicted output. In other words, only the movies that a user has watched can become part of the counterfactual explanation of the model prediction.
explainer_SEDC = SEDC_Explainer(feature_names = feature_names,
threshold_classifier = threshold_classifier_probs,
classifier_fn = classifier_fn)
Show indices of positively-predicted test instances.
indices_probs_pos #all documents that have a predicted 'Medical' topic
Explain why the document with index = 143 is predicted as a 'Medical' topic by the model.
newsgroups_test = list(newsgroups_dataset_2[i] for i in indices_test)
The document looks as follows.
newsgroups_test[73]
index = 73
instance_idx = x_test[index]
explanation = explainer_SEDC.explanation(instance_idx)
The explanation contains 17 words out of the 88 featurized words that are used by the SVM model.
Show more information about the explanation(s): explanation[0] shows the explanation set(s), explanation[1] shows the number of active features of the instance to explain, explanation[2] shows the number of explanations found, explanation[3] shows the number of features in the smallest-sized explanation, explanation[4] shows the time elapsed in seconds to find the explanation, explanation[5] shows the predicted score change when removing the feature(s) in the smallest-sized explanation, explanation[6] shows the number of iterations that the algorithm needed.
explanation
print("IF the document did not contain the word(s) " + str(explanation[0][0]) + ", THEN the predicted topic would no longer be 'Medical'.")
Explain why the document with index = 143 is predicted as a 'Medical' topic by the model.
The document looks as follows.
newsgroups_test[165]
index = 165
instance_idx = x_test[index]
explanation = explainer_SEDC.explanation(instance_idx)
print("IF the document did not contain the word(s) " + str(explanation[0][0]) + ", THEN the predicted topic would no longer be 'Medical'.")
explanation