From Natural Language to Executable Narsese: A Neuro-Symbolic Benchmark and Pipeline for Reasoning with NARS
The rapid advancement of large language models (LLMs) has significantly enhanced our ability to generate human-like text. However, these models often fall short when tasked with reasoning that requires explicit symbolic structures, multi-step inference, and the ability to express interpretable uncertainty. A recent paper titled “From Natural Language to Executable Narsese” addresses these limitations by introducing a neuro-symbolic framework aimed at translating natural language reasoning problems into executable formal representations.
The framework leverages first-order logic (FOL) and Narsese, the language of the Non-Axiomatic Reasoning System (NARS), to create a more robust reasoning process. Key contributions of this paper include the introduction of a new benchmark, NARS-Reasoning-v0.1, which pairs natural language reasoning problems with their corresponding FOL forms and executable Narsese programs. The benchmark is classified into three gold labels: True, False, and Uncertain, providing a comprehensive evaluation of reasoning tasks.
Key Features of NARS-Reasoning-v0.1
- Benchmark Development: NARS-Reasoning-v0.1 serves as a foundational set of natural language reasoning problems that are essential for understanding how LLMs can interact with symbolic reasoning.
- Compilation Pipeline: The paper details a deterministic compilation pipeline that converts FOL into executable Narsese, ensuring that the symbolic representations are not only syntactically correct but also behave as expected during execution.
- Runtime Validation: The benchmarks are validated through runtime execution in OpenNARS for Applications (ONA), which helps in confirming that the outputs align with the intended answers.
Language-Structured Perception (LSP)
In addition to the benchmark, the authors present a novel concept called Language-Structured Perception (LSP). This approach focuses on training LLMs to produce reasoning-relevant symbolic structures rather than just final verbal responses. By emphasizing the importance of symbolic generation, LSP aims to enhance the quality of reasoning performed by LLMs.
Proof of Concept
As an initial proof of concept, the researchers trained and released a Phi-2 LoRA adapter on NARS-Reasoning-v0.1, specifically for three-label reasoning classification tasks. This indicates that the benchmark not only supports executable evaluations but is also conducive to supervised adaptations.
Conclusion
Overall, this paper positions executable symbolic generation and execution-based validation as a practical pathway toward developing more reliable neuro-symbolic reasoning systems. By bridging the gap between natural language processing and symbolic reasoning, the proposed framework has the potential to significantly advance the capabilities of AI in reasoning tasks that require a deeper understanding of logic and uncertainty.
