Sulabh Sethi · Blog ← Main Site

How RAG Works, and Why It Stops AI From Making Things Up

Large language models will confidently invent a law section that does not exist. Retrieval-Augmented Generation (RAG) is the fix. A plain-English explainer, grounded in a real legal-lookup tool I built for police.

9 June 2026 · 6 min read

Let me start with the problem, because it is a serious one.

Ask a plain chatbot a specific legal question, something like “what is the punishment for theft under the new law,” and it will often give you a clean, confident answer with a section number attached. It sounds exactly right. The grammar is perfect. The tone is authoritative.

And sometimes the section number is completely made up.

Not maliciously. The model is not lying, because it does not know what truth is. In the first post in this series I explained that a large language model is, underneath everything, a next-word predictor. It generates what sounds most plausible based on the text it was trained on. It is not looking anything up. So when it does not actually know the answer, it does not stop and say so. It keeps generating fluent, confident, plausible words. That is what people call a hallucination, and it is the single biggest reason you cannot just point a raw chatbot at serious work like law or policing.

So the question becomes: how do you get the fluency of an LLM without the bluffing? That is what RAG is for.

The one-line idea

RAG stands for Retrieval-Augmented Generation. Ignore the jargon. The idea is simple:

Instead of letting the model answer from memory, you first go and fetch the real documents that actually contain the answer, hand them to the model, and tell it “answer using only this, and point to where you got it.”

That is the whole trick. You stop asking the model to remember, and you start asking it to read.

The open-book exam

Here is the way I explain it to people who do not work in tech.

Picture a student in a closed-book exam who did not study. They will not leave the answer blank. They will write something that sounds right and hope for the best. That is a raw LLM.

Now give that same student an open book, point them to the right page, and tell them to quote from it. The bluffing stops instantly. They are not relying on a shaky memory anymore, they are reading the real thing in front of them and putting it in their own words.

RAG turns a closed-book exam into an open-book one. The model is still doing the explaining, which is what it is good at, but the facts now come from a real document instead of a fuzzy memory.

How it actually works, in three steps

You do not need the maths to get this. It is three steps.

Step one: store. You take your real documents, the actual text of the law for example, and chop them into small pieces. Then you index those pieces so they can be searched by meaning, not just by exact words. The way that works is you turn each piece of text into a kind of fingerprint of its meaning, so two passages that say similar things sit close together even if they use different words. That last part matters, because a citizen asking about “stealing” should still find the section that talks about “theft.”

Step two: retrieve. A question comes in. The system searches that index and pulls out the handful of pieces that genuinely match what was asked. Not the whole book, just the few paragraphs that are actually relevant.

Step three: generate. Now you hand those few real paragraphs to the LLM along with the question, and you instruct it plainly: answer from these passages, and if the answer is not in them, say so. The model writes a clean, plain-language reply, grounded in the actual text, with the source sitting right next to it.

A real example: looking up the law

This is not theory for me. One of the tools I built, which I call Nyaya Mitra, does exactly this for the new criminal laws in India, where the BNS, BNSS, and BSA have replaced the older IPC and CrPC. Officers and citizens need to find the right section, understand it in plain language, and see how it maps to the old law.

A raw chatbot would be dangerous here. It might confidently quote a section that was never written. So the tool does not ask the model to remember the law. When someone asks about theft, it searches the real text of the BNS, pulls the actual section, and only then does the model explain it simply and show exactly which section it came from. If the answer is not in the law, it says it cannot find it, instead of inventing something that sounds official.

That difference, between a confident guess and a grounded answer you can click through and verify, is the entire reason this approach is safe enough to put in front of someone doing real work.

In ordinary chit-chat, a hallucination is a shrug. In law, policing, healthcare, or defence, a confident wrong answer is a real problem. RAG gives you three things those fields actually need.

It gives you citations, so every answer points back to a real source you can check yourself.

It keeps you on the current text, because you control the documents you feed it. When the law changes, you update the documents, not the model. The old IPC-to-BNS shift is a perfect example of why that flexibility matters.

And it lets the system honestly say “I do not know.” A grounded system that admits it cannot find something is far more trustworthy than a fluent one that never stops talking.

The honest limits

RAG is a fix, not a miracle, so let me be straight about where it can still go wrong.

It is only as good as what you put into it. If your documents are out of date or incomplete, the answers will be too. The old “garbage in, garbage out” rule never retires.

It also depends on retrieval grabbing the right passage. If the search pulls the wrong section, the model will faithfully and convincingly explain the wrong thing. The bluffing moves one step back rather than disappearing entirely.

So even with RAG, the rule from the last post still holds. The human checks the cited source. The model points you to the page. You are the one who reads it and decides.

My take

RAG is the reason I am comfortable putting a language model anywhere near legal work, when a raw chatbot would be reckless. It changes what the model is. It stops being an oracle that answers from a hazy memory and becomes a clerk that looks things up and shows you the page it found.

That is a much smaller, much more honest job. And in the kind of work where a wrong answer has real consequences, smaller and honest is exactly what you want.