Vector DB Nuances and Fixes

Your Vector DB Is Lying: The Index Structures Destroying Recall (And How to Fix It) “Your vector database promises 99% recall. You’re getting 67%. Here’s why every index structure makes tradeoffs, and how to choose the right one for YOUR use case.” TL;DR: Vector databases don’t store and search vectors the way you think. HNSW builds a navigation graph that can miss neighbors. IVF clusters can put similar vectors in different buckets. Metadata filtering happens AFTER search, destroying recall. This guide explains how each vector DB actually works, when to use which, and how to configure them properly. ...

October 9, 2025 · 13 min · Lakshay Chhabra

Embeddings 3072: The Dimension Lie

The Dimension Lie: Why Your 3072 D Embeddings Are Mostly Zeros (And How to Fix It) “You’re paying for 3072 dimensions. PCA reveals you’re using ~200. But here’s the twist - those extra dimensions aren’t useless, they’re insurance. Let me show you the math.” TL;DR: Your high-dimensional embeddings aren’t “mostly zeros” - they occupy all dimensions but with exponentially decaying variance and redundancy. Yes, you can compress them to 200-256 dimensions with minimal loss. No, companies aren’t stupid for shipping 3072D models - they’re optimizing for general purpose use. This guide shows you how to measure what YOU actually need and optimize accordingly. ...

October 5, 2025 · 14 min · Lakshay Chhabra

How Words Become Vectors: Embeddings Inside Transformers (Without Tears)

How Words Become Vectors: Embeddings Inside Transformers (Without Tears) “Your embedding API returns a vector. Here are the 12 disasters happening inside and the 5 you can actually fix.” TL;DR: Modern embedding libraries handle CLS pooling disasters for you. But they can’t fix anisotropy (vectors clustering in narrow cones), length bias (longer texts having systematically different magnitude distributions), or domain vocabulary collapse (subword tokenization destroying semantic units). This guide shows what’s breaking, what’s already fixed, and what you still need to handle. ...

October 3, 2025 · 11 min · Lakshay Chhabra