Sparse-by-Design Cross-Modality Prediction: L0-Gated Representations for Reliable and Efficient Learning
Summary: arXiv:2603.26801v1 Announce Type: cross
Abstract
Predictive systems increasingly span heterogeneous modalities such as graphs, language, and tabular records, but sparsity and efficiency remain modality-specific. Techniques like graph edge or neighborhood sparsification, Transformer head or layer pruning, and separate tabular feature-selection pipelines contribute to this fragmentation. Such variability complicates deployment, makes results challenging to compare, and weakens reliability analysis across end-to-end Knowledge Discovery and Data Mining (KDD) pipelines.
A unified sparsification primitive would allow for comparable accuracy-efficiency trade-offs across modalities and enable controlled reliability analysis under representation compression. The central question is whether a single representation-level mechanism can yield comparable accuracy-efficiency trade-offs across various modalities while preserving or improving probability calibration.
Proposed Solution: L0-Gated Cross-Modality Learning (L0GM)
To address these challenges, we propose L0-Gated Cross-Modality Learning (L0GM), a modality-agnostic, feature-wise hard-concrete gating framework that enforces L0-style sparsity directly on learned representations. L0GM connects hard-concrete stochastic gates to each modality’s classifier-facing interface, which includes:
- Node embeddings (for Graph Neural Networks – GNNs)
- Pooled sequence embeddings such as CLS (for Transformers)
- Learned tabular embedding vectors (for tabular models)
This architecture enables end-to-end trainable sparsification with an explicit control knob for the active feature fraction, allowing researchers to fine-tune their models effectively.
Optimization and Interpretability
To stabilize the optimization process and make trade-offs interpretable, we introduce an L0-annealing schedule that induces clear accuracy-sparsity Pareto frontiers. This structured approach empowers users to visualize and select the optimal balance between accuracy and sparsity.
Performance Evaluation
We evaluated L0GM across three public benchmarks: ogbn-products, Adult, and IMDB. The results indicate that L0GM achieves competitive predictive performance while activating fewer representation dimensions. Additionally, it effectively reduces the Expected Calibration Error (ECE) in our evaluation, demonstrating its reliability and efficiency.
Conclusion
Overall, L0GM establishes a modality-agnostic, reproducible sparsification primitive that supports comparable accuracy, efficiency, and calibration trade-off analysis across heterogeneous modalities. This advancement is expected to streamline the development of predictive systems and enhance their reliability across various applications.
