Detecting Simpson’s Paradox: A Step Towards Fairness in Machine Learning

Rahul Sharma*, Minakshi Kaushik, Sijo Arakkal Peious, Markus Bertl, Ankit Vidyarthi, Ashwani Kumar, Dirk Draheim

*Corresponding author for this work

Publication: Chapter in book/Conference proceedingContribution to conference proceedings

Abstract

In the last two decades, artificial intelligence (AI) and machine learning (ML) have grown tremendously. However, understanding and assessing the impacts of causality and statistical paradoxes are still some of the critical challenges in their domains. Currently, these terms are widely discussed within the context of explainable AI (XAI) and algorithmic fairness. However, they are still not in the mainstream AI and ML application development scenarios. In this paper, first, we discuss the impact of Simpson’s paradox on linear trends, i.e., on continuous values, and then we demonstrate its effects via three benchmark training datasets used in ML. Next, we provide an algorithm for detecting Simpson’s paradox. The algorithm has experimented with the three datasets and appears beneficial in detecting the cases of Simpson’s paradox in continuous values. In future, the algorithm can be utilized in designing a certain next-generation platform for fairness in ML.

Original languageEnglish
Title of host publicationNew Trends in Database and Information Systems - ADBIS 2022 Short Papers, Doctoral Consortium and Workshops
Subtitle of host publicationDOING, K-GALS, MADEISD, MegaData, SWODCH 2022, Proceedings
EditorsSilvia Chiusano, Tania Cerquitelli, Robert Wrembel, Kjetil Nørvåg, Barbara Catania, Genoveva Vargas-Solar, Ester Zumpano
Place of PublicationCham
PublisherSpringer
Pages67-76
Number of pages10
ISBN (Electronic)9783031157431
ISBN (Print)9783031157424
DOIs
Publication statusPublished - 2022
Externally publishedYes
Event3rd Workshop on Intelligent Data - From Data to Knowledge, DOING 2022, 1st Workshop on Knowledge Graphs Analysis on a Large Scale, K-GALS 2022, 4th Workshop on Modern Approaches in Data Engineering and Information System Design, MADEISD 2022, 2nd Workshop on Advanced Data Systems Management, Engineering, and Analytics, MegaData 2022, 2nd Workshop on Semantic Web and Ontology Design for Cultural Heritage, SWODCH 2022 and Doctoral Consortium which accompanied 26th European Conference on Advances in Databases and Information Systems, ADBIS 2022 - Turin, Italy
Duration: 5 Sept 20228 Sept 2022

Publication series

SeriesCommunications in Computer and Information Science
Volume1652
ISSN1865-0929

Conference

Conference3rd Workshop on Intelligent Data - From Data to Knowledge, DOING 2022, 1st Workshop on Knowledge Graphs Analysis on a Large Scale, K-GALS 2022, 4th Workshop on Modern Approaches in Data Engineering and Information System Design, MADEISD 2022, 2nd Workshop on Advanced Data Systems Management, Engineering, and Analytics, MegaData 2022, 2nd Workshop on Semantic Web and Ontology Design for Cultural Heritage, SWODCH 2022 and Doctoral Consortium which accompanied 26th European Conference on Advances in Databases and Information Systems, ADBIS 2022
Country/TerritoryItaly
CityTurin
Period5/09/228/09/22

Bibliographical note

Publisher Copyright:
© 2022, Springer Nature Switzerland AG.

Keywords

  • Artificial intelligence
  • Big data
  • Data science
  • Explainable AI
  • Machine learning
  • Simpson’s paradox

Cite this