Evaluating a Multi-Agent Voice-Enabled Smart Speaker for Care Homes: A Safety-Focused Framework
Summary: arXiv:2603.23625v1 Announce Type: new
Abstract: Artificial intelligence (AI) is increasingly being explored in health and social care to reduce administrative workload and allow staff to spend more time on patient care. This paper evaluates a voice-enabled Care Home Smart Speaker designed to support everyday activities in residential care homes, including spoken access to resident records, reminders, and scheduling tasks.
A safety-focused evaluation framework is presented that examines the system end-to-end, combining Whisper-based speech recognition with retrieval-augmented generation (RAG) approaches (hybrid, sparse, and dense). Using supervised care-home trials and controlled testing, we evaluated 330 spoken transcripts across 11 care categories, including 184 reminder-containing interactions.
Evaluation Focus Areas
The evaluations primarily focus on three critical areas:
- Correct identification of residents and care categories: Ensuring that the system accurately recognizes and categorizes residents and their specific care needs.
- Reminder recognition and extraction: Assessing the system’s ability to accurately identify and extract reminders from spoken interactions.
- End-to-end scheduling correctness under uncertainty: Evaluating how well the system can convert spoken instructions into actionable scheduling events, including safe deferral and clarification mechanisms.
Importance of Safety in Care Homes
Given the safety-critical nature of care homes, particular attention is paid to reliability in noisy environments and across diverse accents. The system is supported by features such as confidence scoring, clarification prompts, and human-in-the-loop oversight, ensuring that interactions are safe and efficient.
Results of the Evaluation
In the best-performing configuration (GPT-5.2), the evaluation yielded remarkable results:
- Resident ID and care category matching: Achieved 100% accuracy (95% CI: 98.86-100).
- Reminder recognition: Reached 89.09% accuracy (95% CI: 83.81-92.80) with zero missed reminders, equating to 100% recall. However, some false positives were noted.
- End-to-end scheduling via calendar integration: Achieved 84.65% exact reminder-count agreement (95% CI: 78.00-89.56), indicating remaining challenges in converting informal spoken instructions into actionable events.
Conclusion
The findings suggest that voice-enabled systems, when carefully evaluated and appropriately safeguarded, can support accurate documentation, effective task management, and trustworthy use of AI in care home settings. This research underlines the potential of AI to enhance the quality of care while ensuring the safety and well-being of residents.
