Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment

Abstract

High-dimensional vector similarity search (HVSS) is receiving a spotlight as a powerful tool for various data science and AI applications. As vector data grows larger, in-memory indexes become extremely expensive because they necessitate substantial expansion of main memory resources. One possible solution is to use disk-based implementation, which stores and searches vector data in high-performance devices like NVMe SSDs. However, HVSS for data segment is still challenging in vector databases, where one machine has multiple segments for system features (like scaling) purpose. In this setting, each segment has limited memory and disk space, so HVSS on data segment needs to balance accuracy, efficiency, and space cost. Existing disk-based methods are sub-optimal because they do not consider all these requirements together.

Publication
In ACM SIGMOD/PODS International Conference on Management of Data
Mengzhao Wang
Mengzhao Wang
PhD candidate

My research interests include high-dimensional vector similarity search, proximity graph-based index optimization and vector data management system.