Sulabh Sethi · Blog ← Main Site

How an Investigating Officer Can Actually Use Machine Learning: A Practical Guide

Forget the hype. A grounded, plain-English guide to how the boring, explainable machine learning models can help a real investigating officer sort a pile of cases, spot the odd one out, and defend the reasoning, without ever pretending to be a detective.

8 June 2026 · 6 min read

In the last two posts I explained what these models are, and then which of the boring, old ones I actually reach for. The question I kept getting back was the practical one: “This is interesting, but I have two hundred pending cases, no data scientist, and no time. How would I ever use any of this?”

That is the right question, and it deserves a straight answer. So let me put on the investigating officer’s hat and walk through where these models genuinely help, and just as importantly, where they do not.

First, the mindset. A model is not a detective. It does not solve your case and it does not know anything about justice. The honest way to picture it is this: a very fast, very patient junior assistant who has read every old case file in the building and never gets tired. It cannot tell you who is guilty. But it can tell you where to look first. For an officer drowning in paperwork, that alone is worth a lot.

Here are the places it earns its keep.

1. Sorting the pile

You walk in and there are two hundred items in the queue. Complaints, tip-offs, pending follow-ups. Your day has time for maybe twenty of them done properly. The question is not “who is guilty,” it is “which twenty should I open first.”

This is exactly what these models are good at. You feed in the things you already record, like how recent the incident is, how serious, whether there is a witness, whether it links to anything else, and the model ranks the pile by how likely each item is to need attention now. It is a smarter to-do list. A simple model like logistic regression or a single decision tree will even tell you, in plain terms, why it pushed something to the top. “Flagged because it is recent, serious, and has a named witness.” You stay in charge. It just stops you wasting your best hours on the wrong files.

2. Spotting the odd one out

A lot of investigation is noticing the thing that does not fit. The transaction that is the wrong size. The call that happens at a strange hour. The one entry in a long list that breaks the pattern.

A human is brilliant at this over ten records and useless over ten thousand. A model is the other way around. Show a model what normal looks like across a huge pile of transactions or call records, and it will quietly point at the handful that stand out. It will not tell you the standout is a crime. It will tell you “this one is unusual, you might want to look.” In a financial fraud case, that can turn weeks of squinting at spreadsheets into an afternoon.

3. Connecting cases that nobody linked

Crimes by the same hand often share a fingerprint in the method. Same approach, same timing, same small habits. When those cases are spread across different officers, stations, and months, no single person ever sees them side by side.

A model that compares cases on the details you record can surface “these five look like they belong together.” That is not proof they are connected. It is a lead. But it is the kind of lead that used to depend entirely on one officer happening to remember an old file, and now does not.

4. A second pair of eyes on the paperwork

This one is unglamorous and saves real time. The same models can catch duplicate records, flag entries that look mistyped, and notice when a required field is missing or contradicts another. Before a chargesheet goes up, that quiet check catches the small errors that otherwise come back to bite you later.

5. Explaining your reasoning

This is the one I care about most, and it is why I keep pushing the boring models over the flashy ones.

If a model helps you prioritise a case, someone senior, or eventually a court, may ask why. With the older models you can answer honestly, because the reasoning is readable. “It was ranked high because of these three factors.” You cannot say that about a giant neural network, which is a black box even to the people who built it. In our line of work, a decision you cannot explain is a decision you cannot defend. So for anything that touches a real person, I will take the simple, transparent model every time, even if it is a hair less accurate.

The part nobody should skip

Now the warnings, because this is policing and getting it wrong has a cost.

The model learns from your past records, and it inherits their bias. If certain areas or groups were over-policed before, the model will quietly assume they need more attention now, and dress that old bias up as a neutral number. The model is not objective. It is a mirror of the data you fed it. Treat its output as a hunch from a junior, not a fact from an oracle.

The model points, the human decides. A score is a reason to look, never a reason to act against someone. It is not evidence. It does not belong in a chargesheet as proof of anything. The moment a number starts deciding who gets stopped or charged, you have stopped doing investigation and started outsourcing your judgment, and the law will be right to ask hard questions.

Keep it to where you are sorting work, not judging people. Using a model to decide which files to open first is sensible. Using it to predict that a named individual will commit a crime is a different thing entirely, and it is a road I would not walk down. The officer, the evidence, and the law come first. The model is a help, full stop.

My honest take

The reason these old, plain models fit police work so well is the same reason they look boring. They are fast, they are cheap, and you can read exactly why they said what they said. An investigating officer does not need a magic box that hands down verdicts. You need a tireless assistant that sorts the pile, flags the odd one out, and can always explain itself when asked.

Used that way, with a human firmly in charge, even a decades-old model earns its place in the room. Used the other way, as an oracle you stop questioning, it becomes a liability dressed up as progress. The difference is not the technology. It is how you hold it.