turbovec: what it is, what problem it solves & why it's gaining traction
turbovec: what it is, what problem it solves & why it's gaining traction
What it solves
turbovec is a high-performance vector index designed to reduce the massive RAM requirements of large-scale vector search. It allows users to fit millions of documents in a fraction of the memory (e.g., 10 million documents in 4 GB instead of 31 GB) while maintaining high search speed and recall, making it ideal for air-gapped or memory-constrained RAG stacks.
How it works
Built on Google Research's TurboQuant algorithm, the project uses a data-oblivious quantizer that requires no separate training phase. The process involves:
- Normalization and Rotation: Vectors are normalized to unit directions and multiplied by a random orthogonal matrix to make their coordinate distributions predictable.
- Calibration (TQ+): A shift and scale are fitted to each coordinate during the first ingestion to map empirical data to a canonical Beta distribution.
- Lloyd-Max Quantization: Coordinates are bucketed into 2-bit or 4-bit integers using precomputed optimal boundaries.
- Length-Renormalization: A scalar is stored per vector to correct the systematic underestimation of inner products caused by quantization, ensuring unbiased scoring.
- SIMD Search: Search is performed using hand-written NEON (ARM) and AVX-512BW (x86) kernels that score directly against codebook values without full decompression.
Who it’s for
Developers building Retrieval-Augmented Generation (RAG) applications where privacy, low latency, and memory efficiency are are critical, particularly those using local or air-gapped environments.
Highlights
- Online Ingest: No training step, parameter tuning, or index rebuilds are required as the corpus grows.
- Extreme Compression: Up to 16x compression (e.g., FP32 to 2-bit) with minimal recall loss.
- High Performance: Outperforms FAISS IndexPQFastScan by 10–19% on ARM and remains competitive on x86.
- Filtered Search: Supports search-time filtering via an allowlist, which is integrated directly into the SIMD kernel to avoid unnecessary computation.
- Framework Integrations: Drop-in replacements for in-memory vector stores in LangChain, LlamaIndex, Haystack, and Agno.
- Pure Local: No managed services; data remains on the local machine or VPC.
Sources
- undefinedRyanCodrai/turbovec