About me

I’m working as an AI Engineer at Loop Earplugs.

Prior to Loop, I’ve obtained experience in designing and implementing advanced analytics solutions as a consultant at QuantumBlack (AI by McKinsey), and worked on projects across different industries, topics, and countries. Prior to McKinsey, I’ve obtained my PhD in Data Science from the University of Antwerp (fellowship granted by the Research Foundation-Flanders), with Prof. David Martens as my supervisor. In my research, I’ve contributed to the field of Explainable AI by developing new methods to explain black-box AI models that are trained on big human behavioral data (think of web browsing data, GPS locations, financial transactions, social media data,…).

Research portfolio

Explaining personality predictions from consumer spending data (2021). We connected financial spending records to personality traits, and show which transaction patterns are most predictive for certain traits. This project was in collaboration with Prof. Sandra Matz and Prof. Robert Farrokhnia, both affiliated at Columbia University. Read the paper here.

Understanding preferences for explanations generated by XAI algorithms (2021). We studied people’s preferences for explanations of algorithmic decisions. We focused on three main attributes that describe automatically-generated explanations from existing XAI algorithms (format, complexity, and specificity), and capture differences across contexts (advertising vs. loan applications) and users’ cognitive styles. We found that counterfactual explanations are not popular among users, unless they follow a negative outcome (e.g., loan application was denied). We also found that users are willing to tolerate some complexity in explanations. Finally, our results suggest that preferences for specific (vs. more abstract) explanations are related to the level at which the decision is construed by the user, and to the deliberateness of the user’s cognitive style. Read the working paper here.
Gaining insight into prediction models using grouped features (2021). We proposed a novel technique to extract global explanations for prediction models using grouped features or “metafeatures”. The technique uses rule-extraction to extract a set of rules that mimic the decisions of the black-box model, of which the inner workings are not interpretable. Metafeatures are high-level features that cluster fine-grained features: for example, Facebook pages can be grouped into categories (e.g., “Fashion”, “Data Science”) and credit card transactions can be grouped into spending categories (e.g., “Coffee Shops”). A key finding of our study is that explanation rules with metafeatures (e.g., spending categories) are better at mimicking the black-box than explanation rules extracted with the original behavioral features (e.g., millions of possible financial transactions).
Using LIME and SHAP to counterfactually explain model predictions (2020). We developed novel algorithms (LIME-C and SHAP-C) for computing counterfactual explanations by aligning additive feature attribution explanation methods with the notion of counterfactuals. Counterfactual explanations are tailored to a single model decision and show a set of features such that, when removing them (setting their values to zero), the predicted class changes. (Read my KDNuggets blogpost on counterfactual explanations here.) Martens & Provost (2013) proposed a heuristic algorithm SEDC based on local improvement, originally in the context of explaining document classifications (open-source code here). In this paper, I compared the effectiveness and efficiency of the two novel algorithms against the SEDC algorithm, and found that LIME-C is a suitable alternative to SEDC, especially for data with large instances.

Publications

Implications of Cloaking Digital Footprints for Privacy and Personalization. (2023). Sofie Goethals, Sandra Matz, Foster Provost, David Martes and Yanou Ramon. Working paper available here.
Explainable AI for Psychological Profiling from Behavioral Data: An Application to Big Five Personality Predictions from Financial Transaction Records. (2021). Yanou Ramon, R.A. Farrokhnia, Sandra C. Matz, and David Martens. Information, 12(12), 518. Available online.
Can Metafeatures Help Improve Explanations of Prediction Models When Using Behavioral and Textual Data? (2021). Yanou Ramon, David Martens, Theodoros Evgeniou and Stiene Praet. Machine Learning. Available online. Full pdf available here.
A Comparison of Instance-level Counterfactual Explanation Algorithms for Behavioral and Textual Data: SEDC, LIME-C and SHAP-C. (2020). Yanou Ramon, David Martens, Foster Provost, and Theodoros Evgeniou. Advances in Data Analysis and Classification. Available online. Full pdf available here.
Deep Learning on Big, Sparse, Behavioral Data. (2019).
Sofie De Cnudde, Yanou Ramon, David Martens, and Foster Provost.
Big Data, 7(4), p.286-307. Available online. Full pdf available here.
Understanding Consumer Preferences for Explanations Generated by XAI Algorithms. (2021). Yanou Ramon, Tom Vermeire, Olivier Toubia, David Martens and Theodoros Evgeniou. Working paper available here.

Open-source contributions

Implementation of explanation algorithms SEDC, LIME-C and SHAP-C to compute counterfactual explanations for predictions of classifiers trained on behavioral and textual data (Python code). Take a look at the tutorials on how to use the SEDC explainer object to explain model predictions on behavioral data and text data.

Teaching & other projects

2022 - present: Guest lecturer “AI for Everyone: Demystifying the basics of Artificial Intelligence” at the Master’s course Business Contracts & Technology of Prof. Jan Blockx (Faculty of Law, University of Antwerp). Find the presentation here.
2022 - present: Ambassador of Women in Data Science. More information here.
2018-22: Teaching assistant of Data Mining, Ethics in Data Science, Case studies and trends in Data Mining, and Data Engineering (Major Data Science). Responsible for the Python tutorials and the Data Science Challenge in collaboration with AXA Insurance.
2022: Co-organizer of the Summer School “The American Business Environment” in Washington DC together with Jonas Vandenbruaene.

Talks & blogposts

Explainable AI to Gain Insight into Big Five Personality Predictions from Financial Transaction Records.
SPSP Annual Convention, San Francisco (US), February 2022
Poster Presentation: Suppl. Material 1, Suppl. Material 2
Explainable AI for psychological profiling from behavioral data.
Joint Seminar ADM+ADREM (online), December 2021
Gaining insight into AI systems on digital footprints.
Invited Talk at Data Science Meetup Leuven, March 2021
Watch my presentation here (begins at 57:40)
A Comparison of Counterfactual Algorithms for Explaining Prediction Models on Behavioral and Textual Data.
Invited Talk at Advances in Interpretable Machine Learning and Artificial Intelligence Workshop (CIKM online conference), October 2020
Towards Explainable Prediction Models on Behavioral and Textual Data.
Online Research Seminar, Dpt. of Engineering Management, Antwerp (Belgium), June 2020
Explaining predictive models: the Evidence Counterfactual.
Blogpost Faculty of Business & Economics, University of Antwerp, June 2020
Evidence Counterfactuals for explaining predictive models on Big Data.
KDnuggets blogpost, May 2020
Counterfactual explanations for models built from behavioral and textual data.
Invited Talk at LenddoEFL (Knowledge Sharing), New York City (USA), October 2019
Comparative study of instance-level explanations for big, sparse data.
EURO Conference, Dublin (Ireland), June 2019
Instance-based explanations: motivation, overview, and the evidence counterfactual approach.
European Conference on Data Analysis, Bayreuth (Germany), March 2019, Book of abstracts, ECDA 2019

Behavior and text data for supervised machine learning

MovieLens 1M dataset contains 1 million movie ratings from 6000 users on 4000 movies. Gender, age and occupation of all users are available.
MovieLens 100K dataset contains 100,000 ratings from 1,000 users on 1700 movies. Gender, age and occupation of all users are available.
Yahoo! Movies dataset contains movies ratings and descriptive content information. Gender and age of all users are available.
Libimseti dataset contains 17 million (anonymous) profile ratings from users of a Czech social network Libimsetti.cz. Gender of the users is available.
Flickr dataset contains millions of Flickr pictures with the number of comments per photo and favorite markings by users.
TaFeng dataset contains Chinese grocery store transactions. Socio-demographic information of customers is available.
PhoneStudy dataset contains self-reported Big Five personality traits, 1859 variables tapping real-world mobile sensing behavior, and demographics (Stachl et al., 2019).
Twitter sentiment dataset contains tweets about U.S. airlines labeled with sentiment.
20Newsgroup dataset contains 20,000 news posts labeled with a topic (e.g., science).