The eXplainable Artificial Intelligence (XAI) Triad: Models, Importances, and Significance at Scale

Ghislain Fiévet; Julien Broséus; David Meyre; Sébastien Hergalant

doi:10.64898/2025.12.06.692723

Abstract

Abstract In this study, we present a comprehensive evaluation framework for comparing various combinations of artificial intelligence (AI) methods in the context of explainable AI (XAI) for variable selection in experimental biological and biomedical data. Our goal was to assess the efficiency, computational cost, and accuracy of different method combinations across six simulated scenarios, each replicated ten times. These scenarios encompass various classification and regression complexities, including variance differences, bimodal distributions, eXclusive-OR (XOR) interactions, concentric circles, and nonlinear relationships such as parabolic and sinusoidal functions. We tested several machine learning algorithms, including Decision Trees (DT), Random Forests (RF), Support Vector Machines (SVM), and Multi-Layer Perceptrons (MLP). We combined these models with diverse feature-importance methods such as Gini importance, accuracy decrease, SHAP values (Shapley Additive exPlanations), and Olden’s method. We further applied significance-thresholding approaches, namely PIMP (Permutation IMPortance), mProbes, and the novel simThresh developed for this study. Additionally, we explored different dataset sizes to evaluate the scalability of these methods. Our analysis revealed substantial differences in computational demands, ranging from very rapid evaluations (e.g., DT combined with Gini importance and simThresh averaging 0.15 seconds) to extensive computations (e.g., MLP combined with SHAP and PIMP exceeding 7 hours). Among the tested combinations, RF/Accuracy/PIMP achieved the best overall performance, successfully identifying 59 out of 60 replicates in our benchmark study. However, this approach raises concerns regarding its scalability when applied to large-scale omics datasets in real-world settings due to its computational demands. In contrast, Decision Tree or Random Forest models using Gini and simThresh criteria ranked second, with 50 out of 60 detections. While less accurate, these methods require fewer computational resources, making them more promising candidates for scalable applications in omics data analysis. The proposed evaluation framework thus serves as a valuable tool for method selection, particularly relevant when dealing with large-scale omics datasets where computational resources and accuracy are both critical considerations.

Related Publications

Explainable Artificial Intelligence (XAI)

Mazharul Hossain

Complex machine learning models perform better. However, we consider these models as black boxes. That’s where Explainable AI (XAI) comes into play. Understanding why a model ma...

2021 Zenodo (CERN European Organization fo... 543 citations

Multivariable association discovery in population-scale meta-omics studies

Himel Mallick , Ali Rahnavard , Lauren J. McIver +17 more

It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to thei...

2021 PLoS Computational Biology 2052 citations

This Looks Like That: Deep Learning for Interpretable Image Recognition

Chaofan Chen , Oscar Li , Chaofan Tao +3 more

When we are faced with challenging image classification tasks, we often explain our reasoning by dissecting the image, and pointing out prototypical aspects of one class or anot...

2018 arXiv (Cornell University) 562 citations

All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously

Aaron Fisher , Cynthia Rudin , Francesca Dominici

Variable importance (VI) tools describe how much covariates contribute to a prediction model's accuracy. However, important variables for one well-performing model (for example,...

2018 arXiv (Cornell University) 1147 citations

Definitions, methods, and applications in interpretable machine learning

William J. Murdoch , Chandan Singh , Karl Kumbier +2 more

Significance The recent surge in interpretability research has led to confusion on numerous fronts. In particular, it is unclear what it means to be interpretable and how to sel...

2019 Proceedings of the National Academy o... 1865 citations

Publication Info

Year: 2025
Type: article
Citations: 0
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

The eXplainable Artificial Intelligence (XAI) Triad: Models, Importances, and Significance at Scale

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                    Ghislain Fiévet, 
                                
                                    Julien Broséus, 
                                
                                    David Meyre
                                
                                et al.
                            
                            (2025). 
                            The eXplainable Artificial Intelligence (XAI) Triad: Models, Importances, and Significance at Scale. 
                            
                            .
                            https://doi.org/10.64898/2025.12.06.692723

Identifiers

DOI: 10.64898/2025.12.06.692723