Home Blog GenAI What to do when AI can't read your system? Using RAG to build domain knowledge in microservices

What to do when AI can't read your system? Using RAG to build domain knowledge in microservices

There’s a moment every developer on a complex microservices project knows well. You ask your AI assistant to review a pull request. It looks at the code, flags a few style issues, suggests some improvements. All reasonable. All safe.

And then it completely misses the thing that matters.

The first thought might be that AI is bad at reading code. That couldn’t be more wrong, because the matter was never about the code.

Here’s how one of Boldare’s team solved it.

What to do when AI can't read your system? Using RAG to build domain knowledge in microservices

Table of contents

The hidden problem

In event-sourced, microservices architectures, a significant portion of business logic is invisible to any tool that reads files.

An event gets thrown in one service and consumed somewhere else entirely. The relationship between them exists in business understanding – in the heads of engineers who’ve spent years on the project, in Confluence pages that are six months out of date, in Slack threads that nobody can find.

When you ask an AI to review a PR in this kind of system, it sees the changed files. But does not see what happens downstream when a given aggregate changes. It cannot know which events propagate across service boundaries. It has no idea that a seemingly harmless change in one context will break something three services away.

The result: edge cases that should be caught in review get through. And catching them requires pulling in a senior engineer who carries years of domain context in their head – someone who is expensive, busy, and shouldn’t be the last line of defence against a production incident.

This was exactly the situation on a long-running platform built on Java, Spring, and EventStore. The team had started using Claude Code heavily for development. Velocity went up. Code quality improved. But the AI kept hitting the same wall: it didn’t understand the domain.

So they decided to build it a memory.

What RAG actually solves here

Retrieval-Augmented Generation is often explained as “giving the AI access to your documents.” That framing undersells it and also misses the point.

The real value in a complex domain isn’t access to documents alone. It’s access to relationships – how concepts connect, what depends on what, what breaks when something changes.

The team made a deliberate architectural choice: the knowledge base would not be trained on code. It would be trained on domain knowledge and inter-service event relationships, operating at a high level of abstraction.

Concretely, this meant documenting things like:

  • How business contexts relate to each other
  • What happens downstream when a given aggregate changes
  • Which events propagate across service boundaries and to which consumers

This is knowledge that exists in the system but is not readable from the system. RAG made it queryable.

The Setup

The technical implementation was deliberately constrained. The client’s data security requirements meant nothing could leave their environment, so the entire stack runs locally.

  • Qdrant as the vector database, deployed on Docker
  • mcp-server-qdrant as the MCP server, running in stdio mode with the all-MiniLM-L6-v2 embedding model
  • A Claude Code agent configured to query the knowledge base before making assessments

The MCP integration is what makes it work in practice. Rather than requiring developers to manually retrieve context before prompting, the agent queries the domain knowledge base automatically as part of its review process. When it looks at a PR, it doesn’t just see the diff – it sees the business context surrounding the changed code.

Theresult, as described by one of the engineers building it: edge cases that would previously have required a senior engineer with years of domain experience to catch are now surfaced during AI-assisted review.

The three problems they hit

Building this wasn’t straightforward. Three issues came up that anyone doing something similar will face.

Keeping the knowledge base current

A static knowledge base is a liability. Domain knowledge evolves, services get refactored, event schemas change. If the RAG index doesn’t keep up, it starts hallucinating confidently with outdated context – which is worse than no context at all.

The solution the team landed on: incremental re-indexing based on git diff. Rather than rebuilding the entire index, a post-merge hook identifies changed files and updates only the affected vectors in Qdrant. Zero manual work, and the knowledge base stays current with the codebase.

An alternative approach for teams without the hook infrastructure: hash-based caching per file. Compare the current MD5 of each file against the stored hash, reindex only where they differ.

Too much detail and too many tokens

The instinct when building a knowledge base is to put everything in. This turns out to be counterproductive – overly granular chunks produce noisy retrieval and burn tokens on context that doesn’t help the model reason better.

The team’s approach was to work at a high level of abstraction from the start: relationships and dependencies rather than implementation details. This is the right instinct. A few additional techniques that help:

Chunking per content type – different document types benefit from different chunk sizes. Architecture decision records warrant different treatment than API documentation or event schemas.

Parent-child chunks – index small chunks (around 500 tokens) for precision in retrieval, but return the parent chunk (around 1500 tokens) as the actual context. This gives you the best of both: accurate retrieval and sufficient context for the model to reason with.

Hallucinations without domain correction

An AI that doesn’t know what it doesn’t know is dangerous in a complex system. Without domain context, models fill gaps with plausible-sounding but incorrect assumptions about how services relate.

The fix is in the retrieval pipeline rather than the model. The team uses hybrid retrieval combining dense vector search with BM25 sparse search, followed by re-ranking with a local CrossEncoder model (mmarco-mMiniLMv2-L12-H384-v1, around 900MB). Re-ranking from top-20 to top-5 candidates significantly reduces the noise that causes hallucinations.

The CrossEncoder runs locally, which matters both for data security and latency. It produces comparable quality to paid re-ranking APIs.

One additional technique worth noting: Anthropic’s Contextual Retrieval approach – adding a short context summary to each chunk before embedding – reduces failed retrievals by 49–67% when combined with BM25 and re-ranking. For a domain knowledge base where retrieval failures have real consequences, that’s a meaningful improvement.

What this changes

The conventional narrative around AI in software development focuses on code generation speed. Write more code, write it faster.

That’s real. But in complex, long-running systems, the main bottleneck often is understanding the system well enough to write the right code.

Senior engineers on these projects spend a disproportionate amount of their time answering domain questions: what does this event affect? Is it safe to change this aggregate? Who consumes this topic?

A domain knowledge RAG doesn’t replace that expertise. But it makes it accessible to the AI tools the team is already using – which means junior developers get answers faster, code review catches more, and the senior engineers can spend their time on problems that actually require their judgment.

The key insight from building this: the quality of AI-assisted development in complex systems depends more on retrieval quality than on the model itself. A well-configured retrieval pipeline over good domain knowledge will outperform a better model with no context.

The bigger pattern

Event-sourced microservices is a particularly sharp version of a problem that exists in any system with distributed, implicit dependencies. The same challenge appears in:

  • Large monoliths where business rules are scattered across modules
  • Systems with heavy use of side effects and async processing
  • Platforms that have evolved over years without consistent documentation

In all of these cases, the AI’s limitation isn’t intelligence – it’s context. RAG is a way to give it the context that doesn’t live in the codebase.

The team is still in the testing phase. The architecture is in place, the knowledge base is being built out, and early results are promising enough that the approach has been recommended to other teams working on similar problems.

Whether it scales, whether the maintenance overhead stays manageable, whether the retrieval quality holds up as the knowledge base grows – those are open questions. But the core hypothesis has held: an AI that understands your domain makes fewer dangerous mistakes than one that doesn’t.

And in production systems, fewer dangerous mistakes is worth quite a lot