**In this tutorial, we show you how to compute counterfactual explanations for explaining positively-predicted instances. We use movie viewing data (Movielens1m) where the goal is to predict gender ('Female' user). The counterfactual explanation shows a set of movies such that when removing them from the user's viewing history, the predicted class changes from 'Female' to 'Male'.**

**Import libraries and import data set.**

In [53]:

```
import pandas as pd
import numpy as np
import sedc_algorithm
from function_edc import fn_1
import scipy
from sklearn.metrics import roc_auc_score, accuracy_score, precision_recall_fscore_support, f1_score, confusion_matrix
```

In [54]:

```
%run sedc_algorithm.py #run sedc_algorithm.py module
```

**For this demonstration, we use the Movielens 1M data set, which contains movie viewing behavior of users. The target variable is binary (taking value 1 if gender = 'FEMALE' and 0 if gender = 'MALE').**

In [55]:

```
target = pd.read_csv('target_ML1M.csv')
target = 1-target
data = pd.read_csv('data_ML1M.csv')
feature_names = pd.read_csv('feature_names_ML1M.csv')
```

**Split data into a training and test set (80-20%). We use a L2-regularized Logistic Regression model. We train the LR classifier on the training data set.**

In [37]:

```
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(scipy.sparse.csr_matrix(data.iloc[:,1:3707].values), target.iloc[:,1], test_size=0.2, random_state=0)
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2, random_state=0)
```

In [38]:

```
from sklearn.linear_model import LogisticRegression
#Values of the regularization parameter C in L2-LR.
C = [10**(-3),10**(-2),10**(-1),10**(0),10**(1),10**(2)]
p = np.sum(y_train)/np.size(y_train)
print("The balance of target in training subset is %f." %p)
#There are 70% male users, 30% female users in the training data.
```

**We finetune the regularization parameter using a hold-out validation data set. We finetune the model on validation accuracy.**

In [16]:

```
accuracy_vals=[]
for c in C:
LR = LogisticRegression(penalty='l2', solver='sag', C = c) #L2-regularized Logistic Regression
LR.fit(x_train, y_train)
probs = LR.predict_proba(x_val)[:,1]
threshold_classifier_probs = np.percentile(probs,(100-(p*100)))
predictions_probs = (probs >= threshold_classifier_probs) #Explicit, discrete predictions for validation data instances
accuracy_val = accuracy_score(y_val, np.array(predictions_probs))
accuracy_vals.append(accuracy_val)
print("The finetuning process has ended...")
C_optimal_accuracy = C[np.argmax(accuracy_vals)]
LR_best = LogisticRegression(penalty='l2', solver='sag', C = C_optimal_accuracy)
LR_best.fit(x_train, y_train)
```

Out[16]:

In [56]:

```
probs = LR_best.predict_proba(x_test)[:,1]
threshold_classifier_probs = np.percentile(probs,(100-(p*100)))
predictions_probs = (probs >= threshold_classifier_probs) #Explicit, discrete predictions for validation data instances
accuracy_test = accuracy_score(y_test, np.array(predictions_probs))
print("The accuracy of the model on the test data is %f" %accuracy_test)
indices_probs_pos = np.nonzero(predictions_probs)#indices of the test instances that are positively-predicted
```

In [57]:

```
classification_model = LR_best
def classifier_fn(X):
c=classification_model.predict_proba(X)
y_predicted_proba=c[:,1]
return y_predicted_proba
```

**Create an SEDC explainer object. By default, the SEDC algorithm stops looking for explanations when a first explanation is found or when a 5-minute time limit is exceeded or when more than 50 iterations are required (see edc_agnostic.py for more details). Only the active (nonzero) features are perturbed (set to zero) to evaluate the impact on the model's predicted output. In other words, only the movies that a user has watched can become part of the counterfactual explanation of the model prediction.**

In [19]:

```
explainer_SEDC = SEDC_Explainer(feature_names = np.array(feature_names.iloc[:,1]),
threshold_classifier = threshold_classifier_probs,
classifier_fn = classifier_fn)
```

**Show indices of positively-predicted test instances.**

In [58]:

```
indices_probs_pos #all instances that are predicted as 'FEMALE'
```

Out[58]:

**Explain why the user with index = 13 is predicted as a 'FEMALE' user by the model.**

In [77]:

```
index = 13
instance_idx = x_test[index]
explanation = explainer_SEDC.explanation(instance_idx)
```

In [78]:

```
explanation[0]
```

Out[78]:

In [63]:

```
print("IF the user did not watch the movie(s) " + str(explanation[0][0]) + ", THEN the predicted class would change from 'FEMALE' to 'MALE'.")
```

**Explain why the user with index = 15 is predicted as a 'FEMALE' user by the model.**

In [88]:

```
index = 15
instance_idx = x_test[index]
explanation = explainer_SEDC.explanation(instance_idx)
```

In [89]:

```
explanation[0]
print("IF the user did not watch the movie(s) " + str(explanation[0][0]) + ", THEN the predicted class would change from 'FEMALE' to 'MALE'.")
```

**Show more information about the explanation(s): explanation[0] shows the explanation set(s), explanation[1] shows the number of active features of the instance to explain, explanation[2] shows the number of explanations found, explanation[3] shows the number of features in the smallest-sized explanation, explanation[4] shows the time elapsed in seconds to find the explanation, explanation[5] shows the predicted score change when removing the feature(s) in the smallest-sized explanation, explanation[6] shows the number of iterations that the algorithm needed.**

In [90]:

```
explanation
```

Out[90]:

**Show the 10 first explanation(s) found by the SEDC algorithm to explain the user index = 13. We change max_explained to 10.**

In [72]:

```
explainer_SEDC2 = SEDC_Explainer(feature_names = np.array(feature_names.iloc[:,1]),
threshold_classifier = threshold_classifier_probs,
classifier_fn = classifier_fn, max_explained = 10)
```

In [85]:

```
index = 45
instance_idx = x_test[index]
explanation = explainer_SEDC2.explanation(instance_idx)
```

**There are 32 explanations found after 3 iterations. The time elapsed is less than a second. The number of active features (movies watched) is 122 movies.**

In [87]:

```
explanation
```

Out[87]: