Explaining Neural Networks in Preference Learning: a Post-hoc Inductive Logic Programming Approach
Summary: arXiv:2604.06838v1 Announce Type: new
Abstract: In this paper, we propose using Learning from Answer Sets to approximate black-box models, such as Neural Networks (NN), in the specific case of learning user preferences.
Introduction
The increasing adoption of Neural Networks (NNs) in various domains has led to a growing need for methods that can interpret these complex models. Particularly in preference learning, where understanding user choices is critical, the ability to explain NN decisions becomes paramount. This paper introduces a novel approach that utilizes Inductive Logic Programming (ILP) to create interpretable models based on user preferences.
Methodology
We specifically explore the application of ILASP (Inductive Learning of Answer Set Programs) to approximate preference learning systems. Our approach leverages weak constraints to create a bridge between NNs and interpretable models. The following steps outline our methodology:
- Dataset Creation: We constructed a dataset focused on user preferences over an array of recipes, providing a rich ground for training NNs.
- Training Neural Networks: The dataset serves as the foundation for training NNs that learn to predict user preferences based on various recipe attributes.
- ILASP Implementation: We apply ILASP to approximate the trained NNs, focusing on both global and local approximation methods.
- Dimensionality Reduction: To enhance the efficiency of our approximations, we introduce a preprocessing step utilizing Principal Component Analysis (PCA) to reduce the dimensionality of the dataset.
Experiments
Our experiments are designed to evaluate ILASP’s performance as both a global and local approximator for NNs. We face several challenges, particularly in high-dimensional feature spaces, where both fidelity to the original model and computational efficiency are critical. Key points from our experiments include:
- Global Approximation: We assess how well ILASP can replicate the overall behavior of the NN across the entire dataset.
- Local Approximation: We examine the ability of ILASP to provide explanations for individual predictions made by the NN.
- Fidelity Assessment: We measure the fidelity of the approximated models in relation to the original NN, ensuring that the explanations generated remain accurate.
- Computational Efficiency: We analyze the time taken for both training and approximation processes, aiming to keep these within acceptable limits.
Conclusion
This research presents a significant step towards making Neural Network predictions more interpretable in the context of preference learning. By utilizing ILASP and incorporating dimensionality reduction techniques, we achieve a balance between model fidelity and computational efficiency. Our findings contribute to the ongoing discourse on explainable AI, particularly in domains where user preferences play a crucial role.
Under consideration for publication in Theory and Practice of Logic Programming (TPLP).
