Published on March 24, 2026

SmartSearch: Why Smart Ranking Beats Perfect Order in AI Memory

How SmartSearch learned to pull signal from the noise of messy dialogues without complex algorithms – and why effective ranking is more valuable than a perfectly organized data archive.

Computer Science 12 – 17 minutes min read

Author: Dr. Rafael Santos 12 – 17 minutes min read

«While working on this piece, I caught myself thinking: we're so used to believing that order is always better than chaos that we sometimes fail to see how we create extra work for ourselves. SmartSearch is like that player on the field who doesn't show off with flashy footwork but just always ends up in the right place at the right time. I wonder if this approach will change how the industry thinks about data preprocessing – or will the attachment to «beautiful archives» prove stronger than common sense?» – Dr. Rafael Santos

Imagine a soccer coach who, before every match, reviews recordings of all their team's previous games from the last few years. Hundreds of games. Thousands of hours of footage. The task: quickly find the moment a specific player scored a goal from the corner of the field, in the rain, under pressure from a defender on the left. You could, of course, meticulously organize all the recordings into folders, number them, create a catalog – a process that would take weeks. Or, you could learn to quickly search by meaning and find what you need in seconds, even if the recordings are stored haphazardly. SmartSearch is exactly that second approach. And it works better.

Why AI Needs “Long-Term Memory”

When you talk to a voice assistant or chatbot over several weeks or months, you accumulate a huge amount of information. You've mentioned your name, talked about your job, complained about the weather, and discussed vacation plans. A good assistant should remember all of this – and be able to use it at the right moment.

But here's the problem: conversations are long, information is jumbled, and the context an assistant can “hold in its head” at one time is limited. Large language models – the neural networks that can generate text – have what's called a token limit. A token is roughly a syllable or a short word. If a dialogue is too long, the model simply can't process it all at once.

It's like trying to read an 800-page novel when you're only allowed to hold 50 pages at a time. You have to choose which pages to pick up – and hope you don't make the wrong choice.

This is precisely why artificial intelligence researchers have long been developing conversational memory systems – mechanisms that can extract only the most essential information from a long dialogue history and feed it to the model at just the right moment.

The Old Way: Expensive, Complex, but Familiar

The traditional approach to this problem looks something like this. As an assistant holds a conversation, each new piece of dialogue is immediately processed and structured. A large language model reads the raw text and converts it into a neat set of facts: “The user's name is Andrew. He lives in Moscow. He likes jazz. He works as an architect.” What you get is something like a dossier – tidy, orderly, and easy to search.

Then, when the user asks a question, the system consults this dossier to find the answer. Sometimes, this involves vector databases – mathematical spaces where words and phrases are turned into numerical coordinates, and semantically similar fragments end up “close” to each other. Other times, knowledge graphs are built – diagrams of connections between people, events, and objects mentioned in the conversation.

It all sounds great. But these approaches come with serious drawbacks.

Expensive Structuring. Every time a new chunk of dialogue comes in, it has to be processed by a large language model. This requires computational resources, time, and money – especially with millions of conversations.
Loss of Detail. When we summarize something in our own words, we inevitably lose nuances. The same thing happens with automatic structuring: subtle details, important qualifications, and emotional context can all disappear in the process of “translating” a living conversation into a set of dry facts.
Dependency on Training Data. Systems that use trained models for retrieval require a large number of labeled examples. No data, no reliable performance.
Poor Generalizability. A model tuned for one type of dialogue may perform poorly with another.

It was against this backdrop of shortcomings that SmartSearch emerged – a system that decided to take a completely different path.

SmartSearch: A Dance Without Rehearsal

The core idea of SmartSearch is simple, even a little audacious: there's no need to pre-structure the dialogue history. At all. Let it be stored as is – as raw text, like a stream of living speech. Then, when a question comes in, the system will search for the answer directly within that stream, quickly and intelligently.

It's like a seasoned musician who doesn't memorize the score before a performance but can sight-read any note and play it perfectly at the right moment. No prior preparation – just mastery of execution.

The SmartSearch pipeline consists of three stages. Let's break down each one so that it's clear even without an engineering degree.

Step One: Find Everything Possible 🔍

When a user asks a question, SmartSearch doesn't just look for words from the question in the dialogue text. The system first identifies named entities from the question – these are names of people, places, dates, organizations, and other specific objects. This process is called Named Entity Recognition, or NER.

For example, if the question is “What did Andrew say about his trip to St. Petersburg last year?”, the named entities are “Andrew” and “St. Petersburg.” The system isolates them and gives them more weight in the search than common words like “say” or “trip.”

Next comes a substring search – a literal search for matching text in the dialogue history. It's simple, fast, and reliable. No neural networks at this stage – just text and rules.

The result: the system finds all dialogue fragments containing the necessary words or entities. At this stage, it's better to find too much than to miss something important. And SmartSearch performs brilliantly: the study found that in the initial retrieval stage, the system finds 98.6% of all relevant information. A nearly perfect result.

Step Two: Cast a Wider Net 🕸️

Sometimes, the answer to a question isn't in one place. The information is scattered across several parts of the dialogue, and to piece it together, you need to take a few “steps.”

Imagine a user asks about “Marina's best friend.” The dialogue might mention somewhere that “Marina always goes to the movies with Kate”, and in another place, that “Kate recently moved to Yekaterinburg.” To answer the question fully, you need to connect these two fragments.

SmartSearch does this through entity expansion. The system analyzes the fragments it has already found, extracts new names and objects from them, and runs the search again, this time with the new terms. This process repeats several times, like ripples on water: each new step expands the scope, adding related fragments. The rules for this expansion are handwritten – no training, no neural network. Just logic and grammar.

It's like the “six degrees of separation” game: you can get from one person to another through a chain of acquaintances. SmartSearch does the same thing with information.

Step Three: Rank with Intelligence 🎯

Now we get to the most important – and most interesting – part. After the first two stages, the system has a large set of dialogue fragments: some are definitely relevant, some are probably relevant, and some are not relevant at all. It's time to pick the best ones.

At this stage, SmartSearch uses two trained ranking tools that work in tandem.

The first is ColBERT. This is a model that can compare a query and a text fragment at the level of individual words. It doesn't just check for matches; it assesses how much each word in the query “resonates” with each word in the fragment. This is far more accurate than simply counting matches. ColBERT works quickly and can handle a large number of candidates.

The second is a CrossEncoder. This is a more powerful model: it takes the query and fragment together and evaluates their relevance as a single unit, considering subtle semantic connections. It's more accurate but slower – which is why it's only applied to the top candidates already selected by ColBERT.

The final score for each fragment is a combination of the results from both tools. This is called rank fusion. It's like a jury of two judges: one evaluates quickly and broadly, the other slowly but very carefully. Together, they deliver a more accurate verdict than either could alone.

And all of this – including both ranking tools – runs on a standard CPU and takes about 650 milliseconds. Less than a second. For a conversational assistant, that's perfectly acceptable.

Smart Truncation: Taking Only What's Needed

After ranking, the system must pass the selected fragments to the large language model that will form the final answer. But the model has a limited “pocket” – that token limit we talked about. This means a decision has to be made about how many fragments to include.

The traditional method: take the top N fragments, and that's it. Simple, predictable, but imprecise. If the cutoff happens in the middle of a crucial piece of information, the answer will be incomplete. If you include too much, the model drowns in noise.

SmartSearch uses adaptive score-based truncation. The system looks at the relevance scores assigned by the CrossEncoder to each fragment. As long as the score is high, it keeps taking them. As soon as the score drops sharply, it stops. This creates a dynamic boundary that adjusts to the specific query and dialogue.

The result is impressive: on average, SmartSearch sends 8.5 times fewer tokens to the model than systems that just feed it all available context. And the quality of the answers doesn't drop – it actually improves. That's because the model focuses on what's truly important instead of trying to make sense of a mountain of irrelevant information.

It's like brewing a shot of espresso instead of a bucket of instant coffee: less volume, more concentration, better taste.

Oracle Analysis: Finding the Bottleneck

The researchers conducted an interesting experiment known as an oracle analysis. They tested what would happen in an ideal scenario: the system finds all the necessary fragments (a 98.6% recall), but then, without any ranking, it truncates them to a fixed token limit.

The result was telling: with this approach, only 22.5% of the truly important fragments make it to the final stage. Almost everything valuable is lost – simply because it wasn't near the top of the list.

This experiment clearly shows that finding is only half the battle. Prioritizing correctly is what matters. Imagine you find 200 articles online for a research topic, but you can only read 10. Which ones do you choose? If you pick them randomly, you'll probably miss the most important ones. If you use a smart ranking system, you're far more likely to find what you were looking for.

This is precisely the problem that the ranking stage in SmartSearch solves. It turns a theoretically high recall into practically high-quality answers.

How SmartSearch Performed in Tests

SmartSearch was evaluated on two standard test sets used in the scientific community to compare conversational memory systems.

LoCoMo (Long-Context Memory) is a dataset of very long dialogues filled with “noisy” information irrelevant to the task. The system's challenge is to find exactly what's needed for the answer amidst all this noise.

LongMemEval-S (Long Memory Evaluation, Simplified) is a dataset focused on the precise retrieval of specific facts from long dialogues.

SmartSearch's results:

On LoCoMo – 93.5%
On LongMemEval-S – 88.4%

These are the best results among all known systems tested using the same evaluation protocols. Notably, SmartSearch was not fine-tuned for each of these datasets – it operated with the same parameters on both. This indicates that the system generalizes well to different types of tasks.

For comparison, systems that use the full dialogue context (i.e., they pass everything to the model without filtering) use 8.5 times more tokens – and yet they achieve worse results. More isn't better. Especially when it comes to information.

Why This Matters Beyond the Lab

This whole story about SmartSearch isn't just about numbers and benchmarks. Behind it lies a broader idea that's important for understanding where the development of conversational systems is heading.

For a long time, the prevailing wisdom was that the more structured the data fed into a model, the better. The ideal assistant was supposed to first “digest” the entire conversation, file everything away neatly, create a pristine knowledge base – and only then handle queries. SmartSearch shows that this belief isn't an absolute truth.

Sometimes, the best order isn't a perfect archive but the ability to quickly find what's needed in a living, unordered stream. It's like a skilled jazz musician who doesn't play from a score but hears the music and responds to it in real time.

From a practical standpoint, this means:

Lower data preparation costs. No need to spend resources on pre-structuring every dialogue with expensive language models.
Faster deployment. The system can be launched on a new type of dialogue without lengthy tuning and retraining.
Less distortion. The raw dialogue is preserved as is – with all the nuances and details that might be lost during structuring.
Savings on computational resources. Fewer tokens mean lower API costs and faster answer generation.

This makes such an approach particularly attractive for products where response speed and the cost of processing each query are critical – which is to say, for practically any commercial assistant.

Limitations and Open Questions

It would be unfair not to mention that SmartSearch isn't a silver bullet for every problem.

First, the approach relies on substring and named entity matching. This means it works well when the query and the dialogue use the same or similar words. If a user asks about a “vehicle” but the dialogue mentioned a “car”, the system might miss something. This semantic gap remains a vulnerability.

Second, the study was conducted on two specific test sets. How well SmartSearch will perform with dialogues in other languages, in different domains, or with different communication styles is a separate question that requires further investigation.

Third, the system uses two trained ranking components – ColBERT and a CrossEncoder. While they don't require special tuning for specific data, they are still trained models, and their quality depends on the data they were originally trained on.

Nevertheless, the results from SmartSearch are a compelling signal: in developing memory systems for AI assistants, it's worth investing less in complex preprocessing and more in smart ranking.

The Takeaway: Ranking as the New Superpower

SmartSearch is a story about how the most effective solutions sometimes look deceptively simple. Not building a perfect archive, but learning to quickly find what's needed in a living stream. Not structuring in advance, but ranking intelligently at the moment of the query.

The system achieves results that surpass more complex and resource-intensive alternatives – and it does so faster, cheaper, and with less training data. That's because its creators focused on the right question: not “How can we best store information?” but “How can we best find it?”

Algorithms aren't better than us – they're just different. And sometimes, they find solutions where we go looking for complexity.

#applied analysis #technical context #neural networks #ai development #engineering #scaling #human–machine interaction #model optimization #contextual awareness

Source: https://arxiv.org/abs/2603.15599v1

Original Title: SmartSearch: How Ranking Beats Structure for Conversational Memory Retrieval

Article Publication Date: Mar 16, 2026

Original Article Authors : Jesper Derehag, Carlos Calva, Timmy Ghiurau

Dr. Rafael Santos View Profile

«Algorithms aren't better than us – they're just different.»

View Profile

I'm a programmer who sees AI not as a threat, but as a tool for creativity. I love showing how computers “think” through the lens of music and football.

Previous Article The Two Laws of Consciousness: What Separates Mind from Machine Next Article How to Catch Dark Matter: A Symphony of Atoms in the Crystal Lattice

SmartSearch: Why Smart Ranking Beats Perfect Order in AI Memory

Why AI Needs “Long-Term Memory”

The Old Way: Expensive, Complex, but Familiar

SmartSearch: A Dance Without Rehearsal

Step One: Find Everything Possible 🔍

Step Two: Cast a Wider Net 🕸️

Step Three: Rank with Intelligence 🎯

Smart Truncation: Taking Only What's Needed

Oracle Analysis: Finding the Bottleneck

How SmartSearch Performed in Tests

Why This Matters Beyond the Lab

Limitations and Open Questions

The Takeaway: Ranking as the New Superpower

Related Publications

Getting the Most Out of AI Models: Three Ways to Speed Up Inference

Unsloth Speeds Up MoE Model Training 12x and Boosts Context Window

How a Single Token Broke an Entire Model: The Story of a vLLM Bug

From Research to Understanding

Neural Networks Involved in the Process

1. Research Summarization

2. Creating Text from Summary

3. step.translate-en.title

4. Editorial Review

5. Preparing Description for Illustration

6. Creating Illustration