An Empirical Study of LoRA-based Fine-tuning of Large Language Models for Automated Test Case Generation
Summary: arXiv:2604.06946v1 Announce Type: cross
Abstract
Automated test case generation from natural language requirements remains a challenging problem in software engineering due to the ambiguity of requirements and the need to produce structured, executable test artifacts. Recent advances in large language models (LLMs) have shown promise in addressing this task; however, their effectiveness depends on task-specific adaptation and efficient fine-tuning strategies. In this paper, we present a comprehensive empirical study on the use of parameter-efficient fine-tuning, specifically LoRA (Low-Rank Adaptation), for requirement-based test case generation.
Key Findings
- We evaluated multiple LLM families, including both open-source and proprietary models, under a unified experimental pipeline.
- The study systematically explores the impact of key LoRA hyperparameters, including rank, scaling factor, and dropout, on downstream performance.
- We propose an automated evaluation framework based on GPT-4o, which assesses generated test cases across nine quality dimensions.
Results and Discussion
Experimental results demonstrate that LoRA-based fine-tuning significantly improves the performance of all open-source models, with the Ministral-8B model achieving the best results among them. Furthermore, our findings indicate that a fine-tuned 8B open-source model can achieve performance levels comparable to those of pre-fine-tuned GPT-4.1 models. This highlights the effectiveness of parameter-efficient adaptation in the context of automated test case generation.
While GPT-4.1 models achieve the highest overall performance, the performance gap between proprietary and open-source models is substantially reduced after fine-tuning. This finding is particularly noteworthy as it suggests that cost-efficient, locally deployable open-source models can serve as viable alternatives to proprietary systems, provided they are combined with well-designed fine-tuning approaches.
Implications for Software Engineering
The insights gained from this study provide important guidance for model selection, fine-tuning strategies, and evaluation methods in the domain of automated test generation. Software engineers can leverage these findings to improve the quality and reliability of test case generation processes, ultimately leading to more robust software products.
Conclusion
In conclusion, our empirical study on LoRA-based fine-tuning for automated test case generation underscores the potential of parameter-efficient adaptation techniques in enhancing the capabilities of large language models. The significant improvements observed across various models highlight the importance of tailored fine-tuning strategies in addressing the challenges posed by ambiguous requirements in software engineering.
