Large Language Models (LLMs) are powerful, but they don’t know everything.
Their knowledge is limited to the data they were trained on – and that training cut-off might be months or years ago.
So how do you make them “smarter” with your own data, like company documents, product FAQs, or knowledge bases?
The answer: Retrieval-Augmented Generation (RAG).
1.What is RAG?
Retrieval-Augmented Generation is a technique that combines:
- Embeddings – Converting text into numerical vectors that capture semantic meaning.
- Vector Stores – Databases optimized to store embeddings and quickly find similar ones.
- LLM Generation – Augmenting the user’s query with retrieved, relevant documents before sending it to the model.
Instead of asking the LLM a question “from scratch,” you give it both the question and context from your own data.
2.Creating Embeddings
Embeddings are how we turn text into vectors.
Spring AI provides a simple API for generating embeddings with providers like OpenAI.
import org.springframework.ai.embedding.EmbeddingClient;
import org.springframework.ai.openai.OpenAiEmbeddingClient;
import java.util.List;
public class EmbeddingExample {
private final EmbeddingClient embeddingClient;
public EmbeddingExample(OpenAiEmbeddingClient embeddingClient) {
this.embeddingClient = embeddingClient;
}
public List<Double> createEmbedding(String text) {
return embeddingClient.embed(text).getResult().getEmbedding();
}
}
Input: Spring AI makes it easy to work with LLMs in Java
Output: [0.021, -0.134, 0.873, …] (a vector of floating-point numbers)
3.Storing Embeddings in a Vector Store
Spring AI supports several vector store implementations (Chroma, PostgreSQL/pgvector, Milvus, Pinecone, Redis, etc.).
import org.springframework.ai.vectorstore.ChromaVectorStore;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.embedding.EmbeddingDocument;
import java.util.List;
public class VectorStoreExample {
private final VectorStore vectorStore;
public VectorStoreExample(ChromaVectorStore vectorStore) {
this.vectorStore = vectorStore;
}
public void storeDocument(String id, String content) {
EmbeddingDocument doc = new EmbeddingDocument(id, content);
vectorStore.add(List.of(doc));
}
}
By using a vector store, now our knowledge (e.g., documentation, FAQs) is searchable in vector space.
4.Retrieving Relevant Documents
When the user asks a question, we search the vector store for similar documents using embeddings.
import org.springframework.ai.vectorstore.ChromaVectorStore;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.embedding.EmbeddingDocument;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.SearchResults;
import java.util.List;
public class VectorStoreExample {
private final VectorStore vectorStore;
public VectorStoreExample(ChromaVectorStore vectorStore) {
this.vectorStore = vectorStore;
}
public void storeDocument(String id, String content) {
EmbeddingDocument doc = new EmbeddingDocument(id, content);
vectorStore.add(List.of(doc));
}
public List<String> search(String query) {
SearchResults results = vectorStore.similaritySearch(
SearchRequest.query(query).withTopK(3) // return top 3 matches
);
return results.getMatches().stream()
.map(r -> r.getDocument().getContent())
.toList();
}
}
Query: “What is Spring AI?”
Retrieved documents:
- “Spring AI is a project that helps Java developers work with LLMs.”
- “It provides abstractions for prompts, chat memory, embeddings, and more.”
5.Building a RAG Pipeline
Finally, combine retrieval with the LLM.
Instead of sending only the user’s question, prepend relevant documents to the prompt.
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.openai.OpenAiChatClient;
public class RagExample {
private final OpenAiChatClient chatClient;
private final VectorStore vectorStore;
public RagExample(OpenAiChatClient chatClient, VectorStore vectorStore) {
this.chatClient = chatClient;
this.vectorStore = vectorStore;
}
public String ask(String question) {
List<String> contextDocs = vectorStore.similaritySearch(
SearchRequest.query(question).withTopK(3)).getMatches().stream()
.map(r -> r.getDocument().getContent())
.toList();
String augmentedPrompt = """
Use the following context to answer the question:
%s
Question: %s
""".formatted(String.join("\n", contextDocs), question);
return chatClient.call(new Prompt(new UserMessage(augmentedPrompt)));
}
}
User: “What features does Spring AI provide?”
Retrieved Context (from our vector store):
- “Spring AI supports chat memory, prompt templates, and structured outputs.”
- “It integrates with multiple AI providers and supports embeddings and vector stores.”
LLM Response (augmented):
- Spring AI provides features like chat memory, prompt templates, structured outputs, and embeddings with vector stores.
- It also supports multiple AI providers.
Conclusion
With embeddings, vector stores, and RAG, we can:
- Extend LLMs with our own knowledge base.
- Build search-augmented chatbots.
- Enable semantic document search.
Spring AI makes this seamless with its unified abstraction layer, so you can switch vector DB providers without rewriting your code.
It is time to storing your company wiki, support tickets, or product docs in a vector store and build a chatbot that knows your business inside out!