Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment

Mengzhao Wang, Weizhi Xu, Xiaomeng Yi, Songlin Wu, Zhangyang Peng, Xiangyu Ke, Yunjun Gao, Xiaoliang Xu, Rentong Guo, Charles Xie

November 2023

Abstract

High-dimensional vector similarity search (HVSS) is receiving a spotlight as a powerful tool for various data science and AI applications. As vector data grows larger, in-memory indexes become extremely expensive because they necessitate substantial expansion of main memory resources. One possible solution is to use disk-based implementation, which stores and searches vector data in high-performance devices like NVMe SSDs. However, HVSS for data segment is still challenging in vector databases, where one machine has multiple segments for system features (like scaling) purpose. In this setting, each segment has limited memory and disk space, so HVSS on data segment needs to balance accuracy, efficiency, and space cost. Existing disk-based methods are sub-optimal because they do not consider all these requirements together.

Type

Conference paper

Publication

In ACM SIGMOD/PODS International Conference on Management of Data

Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment

Abstract

Mengzhao Wang

PhD candidate