Vector DB vs. Postgres pgvector: Pick One for Your AI Product
For: A seed-to-Series-A SaaS CTO who just shipped a RAG or semantic search feature on pgvector and is now being told by their infra lead that they need a 'real' vector database before they scale — but has no framework for deciding whether that's a genuine architectural necessity or premature optimization dressed up as engineering rigor
Your infra lead pinged you in Slack: "We need to move off pgvector before the next funding milestone — it won't hold up." You've shipped a RAG feature on Postgres, it works, customers like it, and now someone wants to introduce a second database. The instinct to migrate sounds like engineering rigor. Often it's premature optimization wearing a hard hat.
Most pgvector vs Pinecone comparisons benchmark queries-per-second on a million vectors and call it a day. That's the wrong axis. Throughput is rarely the binding constraint for SaaS workloads under ~10M vectors. The decision that actually matters is whether your product's correctness depends on your vector index agreeing with your relational data — and what you're willing to operate to keep them in sync.
Define the decision crisply
You are choosing between two architectures:
- Option A: pgvector in your existing Postgres. Embeddings live in a
vectorcolumn next to the row they describe. One database, one transaction boundary, one backup story. - Option B: A dedicated vector database (Pinecone, Weaviate, Qdrant, Milvus, Vespa). Embeddings live in a purpose-built ANN engine. Your scalar data stays in Postgres. The two are joined by an application-layer ID.
The question is not "which is faster." The question is: at your current scale and trajectory, which architecture's failure modes can you afford?
The five axes that actually matter
1. Transactional consistency between scalar and vector data
This is the axis nobody benchmarks and the one that quietly determines correctness. When a user deletes a document, edits a product description, or revokes access to a record, three things must happen atomically: the row updates, the embedding updates, and the index reflects both. With pgvector, that's one transaction. BEGIN; UPDATE documents... ; COMMIT; and the index is consistent.
With a dedicated vector DB, you are running a distributed write across two systems that were not designed to agree. Either you accept eventual consistency (and the support tickets when a deleted document keeps appearing in search for 30 seconds), or you build an outbox pattern with a worker that retries on failure, or you implement two-phase commit manually. All three are real engineering. None of the vendors will solve it for you.
If your product is a chat assistant over a knowledge base, eventual consistency is usually fine. If your product gates legal documents, medical records, or financial data on access control filters, it is not. A leaked vector in a healthcare context is a HIPAA event, not a UX bug.
2. Operational complexity budget
How many on-call engineers do you have? If the answer is fewer than five, every new piece of stateful infrastructure has a hidden cost: someone has to understand its failure modes at 2am. pgvector adds zero operational surface — you already run Postgres, you already back it up, you already have a replication story.
A managed vector DB shifts that work to a vendor, but introduces new concerns: a second SLA, a second status page, a second auth model, and a third-party data residency conversation with enterprise customers. Self-hosted Qdrant or Milvus is worse — you now run a distributed system whose recovery semantics you do not yet understand.
3. Where pgvector's real ceiling lives
pgvector is not infinitely scalable, but its ceiling is higher than most teams realize. HNSW indexes in pgvector handle millions of vectors with sub-100ms p95 latency on a properly sized instance. Where it actually breaks down:
- Index build time on large datasets. Building an HNSW index on tens of millions of vectors takes hours and locks writes unless you use
CONCURRENTLY. Reindexing during a model swap is painful. - Memory pressure. HNSW wants the graph in RAM. If your vector table outgrows shared_buffers + OS page cache, latency cliffs appear.
- Filtered queries with low selectivity. If you're filtering vectors by tenant_id across thousands of tenants with skewed sizes, the planner can make poor choices. Partial indexes and partitioning help, but it's work.
- Hybrid search. Combining BM25 lexical search with vector similarity is awkward in Postgres. Dedicated engines like Vespa and Weaviate handle this natively.
If none of those describe your workload, you do not have a pgvector scaling problem. You have a vendor pitch.
4. Query patterns beyond simple kNN
If all you do is "find the 10 nearest vectors to this query embedding, optionally filtered by one or two scalar fields," pgvector handles it. If you need:
- Multi-vector queries (ColBERT-style late interaction)
- Sparse + dense hybrid retrieval with learned fusion
- Metadata filtering across high-cardinality fields with strict recall guarantees
- Per-tenant index isolation at hundreds-of-thousands-of-tenants scale
...then dedicated engines start earning their keep. Vespa and Weaviate are genuinely better at hybrid retrieval. Pinecone's namespace model is simpler than partitioning Postgres tables.
5. Team familiarity and hiring
Every engineer you hire knows Postgres. Maybe one in ten has run Pinecone in production. Maybe one in fifty has tuned Milvus. This matters less than the other axes, but it compounds: the more exotic your stack, the longer your onboarding, and the more your senior engineers become the bottleneck for any infra change.
Honest scoring
| Axis | pgvector | Dedicated Vector DB |
|---|---|---|
| Transactional consistency with scalar data | Native, free | Application-layer problem you own |
| Operational overhead | None beyond Postgres | New system, new SLA, new failure modes |
| Raw ANN throughput at >50M vectors | Achievable but requires tuning | Generally better out of the box |
| Hybrid (lexical + vector) search | Awkward; possible with extensions | First-class in Weaviate, Vespa |
| Index build / reindex cost | Painful at scale, blocks writes without care | Designed for it |
| Multi-tenancy at extreme scale | Partition or filter; manual | Namespaces, collections built in |
| Team familiarity | Universal | Specialist |
| Vendor lock-in | Low (it's just Postgres) | Moderate to high |
What pgvector is bad at
I am not selling pgvector. It has real weaknesses. Index builds on large tables block or take hours. Memory tuning is unforgiving. Hybrid search requires combining tsvector and vector queries with manual score fusion, which most teams get wrong on the first try. If you swap embedding models, you reindex the world, and Postgres doesn't make that easier. And at the very high end — billions of vectors with strict latency budgets — pgvector will lose to a purpose-built engine.
What dedicated vector DBs are bad at
Equally honest: they are a second source of truth. Every one of them. You will write code to keep them in sync with Postgres, and that code will have bugs. Their query languages are proprietary, their backup tooling is younger than Postgres's, and the managed offerings have had real outages. Pinecone's free tier is generous but the moment you need filters at scale, costs grow non-linearly. Self-hosted Milvus has a steep learning curve and a non-trivial cluster topology.
The decision rule
Use this in order. Stop at the first match.
If you are pre-product-market-fit or under ~5M vectors
Stay on pgvector. The migration cost buys you nothing. You will iterate on chunking, retrieval strategy, and reranking ten times before vector DB choice matters. Spend the engineering cycles on retrieval quality, not infra.
If your correctness guarantee requires that deletes and ACL changes are immediately reflected in search results
Stay on pgvector. Healthcare, legal, regulated fintech, anything with row-level security — the transactional story is worth more than any throughput win. We saw this play out building HealthPotli's drug interaction system, where embedding lookups had to agree with the live product catalog within the same transaction. Eventual consistency was never on the table.
If you're doing serious hybrid retrieval (lexical + dense + reranking) and recall quality is a product-defining feature
Consider Weaviate or Vespa. Not Pinecone — its hybrid story is weaker than people assume. Vespa in particular is what you want if retrieval quality is your moat. Be honest about whether it actually is.
If you have >50M vectors, multi-tenant isolation across thousands of tenants, and pgvector's index build / memory ceiling is a real, measured problem
Migrate to a dedicated vector DB. At this scale the operational tax is justified. Pinecone for simplicity, Qdrant if you want to self-host, Milvus if you need the most control. Plan the dual-write story carefully — outbox pattern, idempotent consumers, reconciliation jobs.
If your infra lead is pushing the migration but cannot point to a specific failing metric
Stay on pgvector and ask for numbers. "It won't scale" is not an argument. "Our p95 retrieval latency is 340ms at 8M vectors and we project 40M in twelve months" is. Without the second sentence, you are migrating on vibes.
The migration nobody talks about
One thing rarely covered: migrating off a dedicated vector DB back to pgvector is harder than the other direction. The proprietary query semantics, the namespace model, the hybrid scoring — your application code grows hooks into all of it. Starting on pgvector and migrating out when you have a measured reason is a one-way door you can defer. Starting on Pinecone and discovering six months later that your real problem was retrieval quality, not infra, is a more expensive lesson.
For most seed-to-Series-A SaaS products shipping RAG or semantic search, pgvector in production is not a compromise. It's the right answer. The teams we've worked with across fintech, healthcare, and logistics ship AI features faster when they keep their data in one place and treat the vector index as an extension of their existing database, not a separate system to coordinate with.
Revisit the decision when you have evidence — a real benchmark, a real query pattern that pgvector can't serve, a real consistency model that says eventual is fine. Until then, the boring answer is the correct one.
Frequently Asked Questions
Is pgvector production-ready for AI products?
Yes, for the vast majority of SaaS workloads. pgvector with HNSW indexes handles millions of vectors with sub-100ms latency on appropriately sized Postgres instances. The real production risks are index build time on large tables, memory tuning, and hybrid search ergonomics — not raw query performance. Companies running RAG and semantic search features at meaningful scale do so on pgvector every day.
When should I actually migrate from pgvector to a dedicated vector database?
Migrate when you have measured evidence: p95 latency exceeding your SLO at current volume, index rebuild times blocking deployments, or hybrid retrieval requirements pgvector cannot meet. "We'll need to scale eventually" is not evidence. If you have tens of millions of vectors, thousands of tenants needing isolation, or hybrid retrieval as a core product differentiator, a dedicated engine becomes defensible.
What's the biggest hidden cost of using Pinecone or another dedicated vector DB?
Dual-write consistency. Your scalar data lives in Postgres; your vectors live in the vector DB; nothing makes them agree. You will build an outbox pattern, retry logic, and reconciliation jobs — and those will have bugs. For products where deletes and access-control changes must be reflected in search results immediately, this is a correctness problem, not a performance one.
How does pgvector compare to Pinecone for RAG applications?
For most RAG workloads under ~10M vectors, pgvector matches Pinecone on latency and beats it on operational simplicity. Pinecone wins on out-of-the-box scaling to very large indexes, namespace-based multi-tenancy, and zero infrastructure management. pgvector wins on transactional consistency, cost predictability, vendor independence, and team familiarity. Most seed-to-Series-A teams should default to pgvector.
Can pgvector handle hybrid search combining keyword and semantic search?
Technically yes, by combining Postgres full-text search (tsvector) with vector similarity and fusing scores at the application layer. Pragmatically, it's awkward, easy to get wrong, and lacks the learned fusion approaches that Vespa and Weaviate offer natively. If hybrid retrieval quality is central to your product, evaluate dedicated engines. If it's a nice-to-have, pgvector with reciprocal rank fusion is fine.
How long does it take to migrate from pgvector to a dedicated vector database?
It depends heavily on dataset size, embedding pipeline complexity, and how deeply hybrid search is wired into your application. The dual-write infrastructure and consistency model usually takes longer than the data migration itself. For a personalized assessment of your specific architecture, contact CodeNicely.
Found this useful? CodeNicely publishes engineering and product playbooks weekly. Browse the archive or tell us what you're building.
_1751731246795-BygAaJJK.png)