Copy the page URI to the clipboard
Pavón Pérez, Ángel; Fernandez, Miriam; Al-Madfai, Hasan; Burel, Grégoire and Alani, Harith
(2023).
DOI: https://doi.org/10.1145/3578503.3583605
Abstract
Machine Learning (ML) algorithms are embedded within online banking services, proposing decisions about consumers’ credit cards, car loans, and mortgages. These algorithms are sometimes biased, resulting in unfair decisions toward certain groups. One common approach for addressing such bias is simply dropping the sensitive attributes from the training data (e.g. gender). However, sensitive attributes can indirectly be represented by other attributes in the data (e.g. maternity leave taken). This paper addresses the problem of identifying attributes that can mimic sensitive attributes by proposing a new approach based on covariance analysis. Our evaluation conducted on two different credit datasets, extracted from a traditional and an online banking institution respectively, shows how our approach: (i) effectively identifies the attributes from the data that encapsulate sensitive information and, (ii) leads to the reduction of biases in ML models, while maintaining their overall performance.