Why use pgvector over Pinecone or specialized vector DBs?

For 95% of enterprise use cases, operational simplicity wins. If you already have PostgreSQL, adding the pgvector extension allows you to keep your relational data (Users, Transactions) and Vector data in the same ACID-compliant transaction boundary. No new infrastructure to procure or secure.

Is Spring AI production ready in 2026?

Yes. With the release of Spring AI 1.0.0 GA in May 2025 (and subsequent 2.0 milestones), the API has stabilized. It now includes extensive observability (Micrometer traces), standardized error handling, and robust 'Advisor' APIs for RAG, making it suitable for Tier-1 enterprise applications.

Do I need LangChain4j if I use Spring AI?

Not necessarily. Spring AI is the 'Spring-native' way, offering tighter integration with Spring Boot's autoconfig and lifecycle. LangChain4j is excellent but follows a different design philosophy. Choose Spring AI if you want a seamless 'Spring' experience; choose LangChain4j if you need more agentic/autonomous features that Spring AI is still catching up on.

How does pgvector performance verify against HNSW?

pgvector supports HNSW (Hierarchical Navigable Small World) indexes since version 0.5. As of 2026, pgvector's HNSW implementation rivals dedicated vector stores for datasets up to 10-50 million vectors, which covers the vast majority of internal enterprise RAG use cases.

Building A RAG Pipeline With Spring AI And Pgvector (No Python Required)

AI Engineering Updated Jan 2026 · 15 min read

Building a RAG Pipeline with Spring AI and pgvector

The “Python Tax” is officially repealed.

Table of Contents

For too long, the ‘AI Engineering’ world has been gatekept by Python. If you wanted to build a RAG (Retrieval Augmented Generation) pipeline, you had to spin up a FastAPI service, manage a fragile `requirements.txt`, and bridge it to your robust Java backend via REST. It was brittle, operationally complex, and frankly, unnecessary.

As of 2026, with the maturity of Spring AI 1.0+ and the widespread adoption of PostgreSQL pgvector, Java developers can now build end-to-end, production-grade GenAI applications without writing a single line of Python. This guide is your blueprint.

Why this Stack? (The “Boring” Stack)

Spring AI: Provides a portable API across OpenAI, Bedrock, and Gemini. It handles the “glue” code securely.
pgvector: Turns your existing Postgres instance into a Vector Database. No new vendors, no new contracts.
Java 23+: With Virtual Threads, Java allows for highly concurrent ingestion pipelines that smoke Python’s async loops.

1. The Architecture: Keep It Single-Stack

In the Python-centric world, a RAG architecture typically involves a mess of microservices. In the Spring world, we collapse this complexity.

[Document Source] ⬇ (ETL) [Spring Batch / Spring Integration] ⬇ (EmbeddingClient) [PostgreSQL (pgvector)] ⬇ (Vector Search) [Spring AI ChatClient] ➡ [LLM (GPT-4/Claude)]

Notice what’s missing: Vector DB Glue Code. Because we are using Postgres, our transactional data (e.g., “Is this user a premium member?”) lives right next to our vector data. We can join them in a single SQL query. That is a superpower specialized Vector DBs generally lack.

2. Setting Up the Foundation

Dependencies (Gradle)

First, let’s pull in the Spring AI BOM and the pgvector starter. Note that in 2026, we are using the `1.0.0` (or newer) release train.

dependencies {
    // The core starter
    implementation 'org.springframework.ai:spring-ai-openai-spring-boot-starter'
    implementation 'org.springframework.ai:spring-ai-pgvector-store-spring-boot-starter'
    
    // For robust ETL processing
    implementation 'org.springframework.boot:spring-boot-starter-batch'
    
    // Postgres driver
    implementation 'org.postgresql:postgresql'
    implementation 'org.springframework.boot:spring-boot-starter-jdbc'
}

dependencyManagement {
    imports {
        mavenBom "org.springframework.ai:spring-ai-bom:1.0.0"
    }
}

Database Schema (The Search Index)

You don’t need a complex migration script. Spring AI can auto-initialize the schema, but as Senior Engineers, we prefer explicit control. Enable the extension and create the HNSW index for speed.

-- Enable the extension (Run once)
CREATE EXTENSION IF NOT EXISTS vector;

-- The standard Spring AI table structure
CREATE TABLE IF NOT EXISTS vector_store (
    id uuid DEFAULT uuid_generate_v4() PRIMARY KEY,
    content text,
    metadata json,
    embedding vector(1536) -- OpenAI uses 1536 dimensions
);

-- CRITICAL: Create an HNSW index for performance
-- Without this, queries will be full table scans (slow!)
CREATE INDEX ON vector_store USING hnsw (embedding vector_cosine_ops);

Performance Tip: Do not just use `IVFFlat` (Inverted File Flat) unless your dataset is static. In 2026, `HNSW` (Hierarchical Navigable Small World) is the gold standard for dynamic datasets involving frequent updates, offering better recall/performance trade-offs.

3. The Ingestion Pipeline (ETL)

A RAG system is only as good as its data. “Garbage In, Garbage Out.” We need to Chunk, Embed, and Store.

The Document Reader

Spring AI provides `DocumentReader` interfaces for PDF, JSON, and Text. Here is a robust service that ingests a document:

@Service
public class IngestionService {

    private final VectorStore vectorStore;
    private final TokenTextSplitter textSplitter;

    public IngestionService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
        // Split by tokens (better for LLM context windows)
        this.textSplitter = new TokenTextSplitter(defaultChunkSize, defaultMinChunkSizeChars, defaultMinChunkLengthToEmbed, defaultMaxNumChunks, true);
    }

    @Transactional
    public void ingestFile(Resource file) {
        // 1. Read
        TikaDocumentReader loader = new TikaDocumentReader(file);
        List documents = loader.get();

        // 2. Transform (Chunking)
        // This is crucial. Sending a 50-page PDF to an embedding model fails.
        // We break it into context-window-sized semantic chunks.
        List splitDocuments = textSplitter.apply(documents);

        // 3. Load (Embed & Persist)
        // Spring AI handles the call to OpenAI embedding API 
        // and the SQL INSERT/UPSERT behind the scenes.
        vectorStore.add(splitDocuments);
    }
}

4. The Retrieval & Generation (The “Chat”)

Now for the fun part. In Spring AI 1.0, the `ChatClient` has evolved into a fluent, highly testable API. We will use the `QuestionAnswerAdvisor` pattern to handle the RAG logic automatically.

@RestController
@RequestMapping("/api/chat")
public class RagController {

    private final ChatClient chatClient;

    public RagController(ChatClient.Builder builder, VectorStore vectorStore) {
        // We configure the internal RAG logic here
        this.chatClient = builder
                .defaultAdvisors(new QuestionAnswerAdvisor(vectorStore, SearchRequest.defaults()
                        .withTopK(5) // Retrieve top 5 most similar chunks
                        .withSimilarityThreshold(0.7))) // Filter out noise
                .build();
    }

    @PostMapping
    public Map chat(@RequestBody String userQuery) {
        // The framework automatically:
        // 1. Vectorizes the 'userQuery'
        // 2. Queries pgvector for context
        // 3. Stuffs context into the prompt
        // 4. Calls the LLM
        String response = chatClient.prompt()
                .user(userQuery)
                .call()
                .content();

        return Map.of("response", response);
    }
}

That’s it. Roughly 20 lines of code for a full RAG endpoint. No LangChain spaghetti. No separate Python service.

5. Advanced Techniques for 2026

Basic RAG is easy. Production RAG is hard. Here is how to handle the edge cases.

Metadata Filtering (Hybrid Search)

Pure semantic search is often imprecise. If a user asks “What were my earnings in 2024?”, a vector search might return earnings from 2023 because they look “semantically similar.”

We solve this with Metadata Filtering. This is where pgvector shines—it combines JSONB filtering with vector search.

FilterExpressionBuilder b = new FilterExpressionBuilder();
// Create a filter: ONLY search documents belonging to this user AND year 2024
Filter.Expression filter = b.and(
        b.eq("userId", currentUser.getId()),
        b.eq("year", 2024)
).build();

List results = vectorStore.similaritySearch(
    SearchRequest.query(userQuery)
        .withFilterExpression(filter)
);

This is implemented as a standard SQL `WHERE` clause on the `metadata` JSONB column in Postgres. It is incredibly fast.

Re-Ranking (The Precision Booste)

Sometimes vector search retrieves “related” but irrelevant documents. In 2026, it is standard practice to add a Re-ranking step. You retrieve 20 documents from Postgres, and then pass them through a specialized Cross-Encoder model (like Cohere Rerank) to sort them by true relevance.

Spring AI supports this via `DocumentRetriever` chains, allowing you to plug in a re-ranker transparently.

6. Comparison: Spring AI vs. LangChain4j

Feature	Spring AI	LangChain4j
Philosophy	Spring-Native, Opinionated, Integration-heavy	Framework-agnostic, Agent-heavy, Cutting-edge
Configuration	Standard `application.yml` properties	More programmatic builder patterns
Agent Support	Growing (Function Calling), but simpler	First-class citizen (Autonomous Agents)

Verdict: If you are building a transactional Enterprise App where AI is a feature (e.g., a “Co-pilot” for a dashboard), use Spring AI. It fits your lifecycle. If you are building a pure AI Agent that runs autonomously, LangChain4j might offer more flexibility.

7. Why “No Python” Matters for Enterprises

It is not just about language preference. It is about Operational Homogeneity.

Unified Security: You use the same Spring Security context, OIDC/OAuth2 flows, and Vault secrets for your AI logic as you do for your banking logic.
Single CI/CD Pipeline: One Jenkins/GitHub Actions pipeline builds a single JAR. No Docker Compose matching Python versions.
Type Safety: Java Records map JSON responses to strong types. No more `KeyError` at runtime because an LLM hallucinated a JSON field name.
Thread Management: Virtual Threads in Java 21+ allow you to handle thousands of concurrent LLM requests (which are I/O bound) with minimal memory footprint, far outperforming standard Python deployments without complex optimizations.

Conclusion

The days of needing a separate “AI Team” writing Python scripts in a silo are over. With Spring AI and pgvector, AI Engineering is now just… Software Engineering.

You have the database (Postgres). You have the runtime (JVM). You have the framework (Spring). You have everything you need to build the next generation of intelligent applications today.

FAQ

What about local LLMs (Ollama)?

Spring AI supports Ollama out of the box. You simply change a property `spring.ai.ollama.base-url` and the `ChatClient` implementation switches transparently from OpenAI to your local Llama 3 instance. This is perfect for local dev loops.

Does pgvector scale to 100M+ vectors?

For massive scale (100M+), dedicated vector DBs (Milvus, Weaviate) or specialized indexing (DiskANN) might be necessary. However, for 99% of corporate use cases (Support Docs, Wikis, User History), data fits comfortably within Postgres pgvector limits (10M-50M vectors is very doable with HNSW).

How do I handle updates to documents?

This is the hard part of RAG. You need an “Upsert” logic. In the `IngestionService` above, you would ideally checksum the file content before embedding. If the checksum hasn’t changed, skip re-embedding (saving money). If it has, delete old chunks by `documentId` and re-insert new ones.

Govind

For over 15 years, I have worked as a hands-on Java Architect and Senior Engineer, specializing in building and scaling high-performance, enterprise-level applications. My career has been focused primarily within the FinTech, Telecommunications, or E-commerce sector, where I’ve led teams in designing systems that handle millions of transactions per day.

Checkout my profile here : AUTHOR https://simplifiedlearningblog.com/author/

Building a RAG Pipeline with Spring AI and pgvector (No Python Required)