Scientific Knowledge-driven Decoding Constraints Improving the Reliability of LLMs
Summary: arXiv:2604.06603v1 Announce Type: cross
Abstract: Large language models (LLMs) have shown strong knowledge reserves and task-solving capabilities, but still face the challenge of severe hallucination, hindering their practical application. Though scientific theories and rules can efficiently direct the behaviors of human manipulators, LLMs still do not utilize these highly-condensed knowledge sufficiently through training or prompting. To address this issue, we propose SciDC, an LLM generation method that integrates subject-specific knowledge with strong constraints.
By adopting strong LLMs to automatically convert flexible knowledge into multi-layered, standardized rules, we build an extensible framework to effectively constrain model generation on domain tasks. Experiments on scientific tasks including industrial formulation design, clinical tumor diagnosis, and retrosynthesis planning consistently demonstrate the effectiveness of our method, achieving a 12% accuracy improvement on average compared with vanilla generation.
Furthermore, we discuss the potential of LLMs in automatically inductively summarizing highly-condensed knowledge, looking ahead to practical solutions for accelerating the overall scientific research process.
Introduction
Large Language Models have revolutionized the way we interact with technology and information. However, their propensity for hallucination—generating incorrect or misleading information—remains a significant barrier to their practical application in various fields. This raises the question: how can we enhance the reliability of LLMs?
The Role of Scientific Knowledge
Scientific knowledge serves as a cornerstone for effective problem-solving in various domains. By leveraging well-established theories and rules, we can guide LLM behavior more effectively than standard training or prompting. Unfortunately, the existing LLM methodologies often overlook the integration of this crucial knowledge, limiting their capabilities.
Introducing SciDC
SciDC (Scientific Decoding Constraints) is a novel method designed to bridge this gap. By integrating subject-specific knowledge with robust constraints, SciDC transforms flexible knowledge into structured, multi-layered rules. This framework is extensible and can be adapted to different domain tasks, effectively improving the accuracy and reliability of LLM outputs.
Experimental Results
Our experiments span various scientific tasks, including:
- Industrial formulation design
- Clinical tumor diagnosis
- Retrosynthesis planning
In each task, SciDC demonstrated a consistent improvement in accuracy, achieving an average increase of 12% compared to traditional LLM generation methods.
Future Implications
The implications of this research are vast. By enabling LLMs to summarize and utilize condensed scientific knowledge automatically, we can accelerate the scientific research process. This advancement not only enhances the reliability of LLMs but also opens new avenues for their application across diverse fields.
Conclusion
As we continue to explore the potential of LLMs, methods like SciDC represent a significant step forward. By harnessing the power of scientific knowledge and imposing structured constraints, we can improve the reliability and applicability of LLMs, paving the way for their broader adoption in critical tasks.
For those interested in further exploring our work, all the code related to this paper is available at GitHub.
