ReSS: Learning Reasoning Models for Tabular Data Prediction via Symbolic Scaffold
Summary: arXiv:2604.13392v1 Announce Type: new
Abstract: Tabular data remains prevalent in high-stakes domains such as healthcare and finance, where predictive models are expected to provide both high accuracy and faithful, human-understandable reasoning. While symbolic models offer verifiable logic, they lack semantic expressiveness. Meanwhile, general-purpose LLMs often require specialized fine-tuning to master domain-specific tabular reasoning. To address the dual challenges of scalable data curation and reasoning consistency, we propose ReSS, a systematic framework that bridges symbolic and neural reasoning models.
The ReSS Framework
ReSS leverages a decision-tree model to extract instance-level decision paths as symbolic scaffolds. These scaffolds, alongside input features and labels, guide a large language model (LLM) to generate grounded natural-language reasoning that strictly adheres to the underlying decision logic. The integration of symbolic structures with neural models enhances both interpretability and robustness in predictions.
Data Generation and Model Fine-Tuning
The resulting high-quality dataset is employed to fine-tune a pretrained LLM into a specialized tabular reasoning model. This fine-tuning process is further enhanced through a scaffold-invariant data augmentation strategy, which aims to improve the model’s generalization capabilities and explainability across various datasets.
Assessment of Faithfulness
To rigorously assess the faithfulness of the generated reasoning, we introduce quantitative metrics that include:
- Hallucination Rate: Measures the frequency of inaccuracies in the generated reasoning compared to the actual decision paths.
- Explanation Necessity: Assesses whether the explanations provided are essential for understanding the model’s predictions.
- Explanation Sufficiency: Evaluates if the explanations are comprehensive enough to convey the reasoning behind predictions.
Experimental Results
Experimental results on medical and financial benchmarks demonstrate that ReSS-trained models significantly improve upon traditional decision trees and standard fine-tuning approaches. Specifically, models trained using the ReSS framework show improvements of up to 10% in predictive performance while producing faithful and consistent reasoning.
Conclusion
The ReSS framework represents a significant advancement in the field of tabular data prediction, particularly in high-stakes domains where interpretability and accuracy are paramount. By bridging the gap between symbolic and neural reasoning, ReSS not only enhances model performance but also ensures that the reasoning process remains transparent and understandable to users. This innovative approach could pave the way for more reliable AI systems in critical areas such as healthcare and finance.
