Chroma vs Weaviate vs Milvus: Picking an Open-Source Vector Database in 2026

For a couple of years the open-source vector database question had a tidy answer shaped like a staircase. Prototyping on your laptop? Chroma. Mid-size production app that wants hybrid search and modules? Weaviate. Hundreds of millions or billions of vectors? Milvus. You located yourself on the scale axis and read off the name. It was a good heuristic, and it has quietly stopped being true.

What broke it is that every one of these projects grew toward the others. The staircase flattened into a plain.

The convergence that ate the old answer

Chroma — the "just pip install it" one — rewrote its core in Rust and stood up Chroma Cloud, a serverless offering that spins a database in under thirty seconds. So the prototyping tool now scales out. Milvus — the billion-vector behemoth — ships Milvus Lite, an embedded Python mode you install with pip install pymilvus[milvus-lite], so the heavyweight now runs on your laptop. Weaviate sits in the middle and always could go either way, single binary or cluster. The result: all three span laptop-to-cluster. "Which one scales" no longer separates them, because they all do.

The feature moats are eroding on the same schedule. Hybrid search — keyword BM25 fused with vector similarity — was Weaviate's signature differentiator. Milvus 2.5 made it native. Chroma Cloud offers it. The thing you used to switch databases for is becoming a checkbox all of them tick.

So if the spec sheets are converging, what's actually left to decide on?

Default posture: where the tool assumes you are

▟ chroma-core/chroma

A developer-experience-first vector database with a Rust core, embedded local mode, and a serverless cloud

★ 28.5kRustchroma-core/chroma

Chroma's default posture is a single process on your machine. You import it, it persists to a folder, and you are doing retrieval in five minutes with no services to operate. The Rust rewrite means that simplicity no longer costs you raw speed the way the old Python core did. Everything about its design assumes you'd rather start coding than start provisioning — and reaches for the cloud only when one process genuinely isn't enough.

▟ milvus-io/milvus

A high-performance, cloud-native vector database built for billion-scale ANN search, with GPU, DiskANN, IVF and HNSW indexes

★ 44.9kGomilvus-io/milvus

Milvus's default posture is the opposite: a distributed system you operate. Even though Milvus Lite exists, the project's center of gravity is the cluster — separated compute and storage, a menu of index types including GPU-accelerated CAGRA and DiskANN, the architecture you reach for when "billions" is a literal word and not a figure of speech. It's a graduated project of the LF AI & Data Foundation, with the maturity that implies. The price of that ceiling is the floor: run it distributed and you are also running etcd for metadata, object storage like MinIO, and a message queue like Pulsar or Kafka. The benchmark gap between these databases is smaller than the Day-2 operations gap, and Milvus is where that operations gap is widest.

▟ weaviate/weaviate

A cloud-native Go vector database with built-in hybrid search, vectorizer/generative modules, reranking, and GraphQL/REST/gRPC APIs

★ 16.4kGoweaviate/weaviate

Weaviate's posture is a production service from the start. Not embedded-in-your-process, not a distributed system you babysit — a server you run, with the RAG-pipeline batteries (vectorizers, generative modules, rerankers) already included. It assumes you're building an application that will go to production and want the surrounding machinery in the database rather than glued on. If you want one component to own embedding, search, and reranking together, that's the bet Weaviate makes for you.

How to actually choose

Stop asking which one scales; they all do. Ask where you are right now and what you're willing to operate. If you want to be retrieving vectors before lunch with zero infrastructure, Chroma's single-process default is the path of least resistance. If you genuinely have — or will soon have — billions of vectors and a team to run a distributed system, Milvus's ceiling is the highest and its ops bill is the steepest, honestly paid. If you want a production-grade service with the RAG pipeline built in and neither extreme, Weaviate's modular middle is built for exactly that.

As the feature lists converge, the decision quietly migrates from "what can it do" to "what architecture am I signing up to run." Pick the default posture that matches where you actually are, not the one that matches where you hope to be — you can graduate later, and all three now let you. For the managed-and-Postgres end of this same decision, see pgvector vs Pinecone vs Qdrant; for what you put into these databases, the best embedding models for RAG agents.

Frequently asked

Which open-source vector database is easiest to start with?

Chroma. It installs with pip and runs embedded in your process with persistence to local disk, so you can build a working retrieval prototype without standing up any infrastructure. Its core was rewritten in Rust for speed, and Chroma Cloud exists when you outgrow a single machine — but its default posture is "one process on your laptop."

Do I need Milvus's distributed architecture?

Usually not at first. Full distributed Milvus is built for billion-scale search and pulls in external dependencies — etcd for metadata, object storage like MinIO, and a message queue like Pulsar or Kafka — which is a real Day-2 ops burden. Milvus also ships Milvus Lite, an embedded Python mode, so you can start small and graduate later; the distributed engine (with GPU and DiskANN indexes) is there when scale actually demands it.

Is Weaviate still the one for hybrid search?

It was the clearest early answer — Weaviate built BM25 + vector hybrid search and generative/reranker modules into the core with GraphQL, REST, and gRPC APIs. But the moat is narrowing: Milvus 2.5 added native hybrid search and Chroma Cloud offers it too. Weaviate remains a strong, production-grade Go database; just don't pick it on hybrid search alone anymore.

Chroma vs Weaviate vs Milvus: Picking an Open-Source Vector Database in 2026

The convergence that ate the old answer

Default posture: where the tool assumes you are

How to actually choose

Frequently asked

Dex Mareno

Dispatches from the machines, in your inbox

Chroma vs Weaviate vs Milvus: Picking an Open-Source Vector Database in 2026

The convergence that ate the old answer

Default posture: where the tool assumes you are

How to actually choose

Frequently asked

Dex Mareno

Continue reading

pgvector vs Pinecone vs Qdrant: Picking a Vector Database in 2026

How to Choose a Vector Database for AI Agents: pgvector vs Pinecone vs Qdrant

E2B vs Modal vs Daytona: Picking a Code Execution Sandbox for AI Agents

Dispatches from the machines, in your inbox