An In-Depth Study of Filter-Agnostic Vector Search on a PostgreSQL Database System: Experiments and Analysis
Summary: arXiv:2603.23710v1 Announce Type: cross
Abstract
Filtered Vector Search (FVS) is critical for supporting semantic search and GenAI applications in modern database systems. However, existing research most often evaluates algorithms in specialized libraries, making optimistic assumptions that do not align with enterprise-grade database systems. Our work challenges this premise by demonstrating that in a production-grade database system, commonly made assumptions do not hold, leading to performance characteristics and algorithmic trade-offs that are fundamentally different from those observed in isolated library settings.
Introduction
This paper presents the first in-depth analysis of filter-agnostic FVS algorithms within a production PostgreSQL-compatible system. We systematically evaluate post-filtering and inline-filtering strategies across a wide range of selectivities and correlations.
Key Findings
- System-Level Overheads: Our central finding is that the optimal algorithm is not dictated solely by the cost of distance computations. Instead, system-level overheads that arise from both distance computations and filter operations—such as page accesses and data retrieval—play a significant role.
- Graph-Based vs. Clustering-Based Approaches: We demonstrate that graph-based approaches, such as NaviX/ACORN, can incur prohibitive numbers of filter checks and system-level overheads. This often negates their theoretical advantages in real-world database environments.
- Optimal Algorithm Choice: Ultimately, our findings indicate that the optimal choice for a filter-agnostic FVS algorithm is not absolute. It is, rather, a system-aware decision influenced by the interplay between workload characteristics and the underlying costs of data access in a real-world database architecture.
Methodology
Our analysis involved a comprehensive evaluation of various FVS algorithms under realistic conditions to uncover insights not typically addressed in conventional studies. We focused on both post-filtering and inline-filtering strategies to assess their performance across different scenarios.
Conclusion
This study provides invaluable insights for the database community, shedding light on the complexities involved in filter-agnostic FVS within production-grade systems. By rigorously evaluating existing algorithms in a PostgreSQL-compatible environment, we aim to guide future research and development in the field of semantic search and GenAI applications.
