Published on March 16, 2026

MR3: модель для оценки ответов ИИ на десятках языков без заданных правил

MR3: A Model That Evaluates AI Responses in Dozens of Languages Without Predefined Rules

Researchers have introduced the MR3 model, which evaluates the quality of language model responses across multiple languages – without rigid criteria or evaluation templates.

Research / Technical context 4 – 5 minutes min read

Event Source: Capital One 4 – 5 minutes min read

When a large language model answers a question, someone has to decide whether the answer is good or not. In production systems, this role is increasingly being filled by specialized evaluator models called reward models. They are trained to distinguish good responses from bad ones and help the primary model improve through further training.

It sounds simple, but in practice, there are several inconvenient limitations. First, most of these evaluators are trained primarily in English. Second, they are usually tied to a specific set of criteria – that is, predefined rules about what constitutes a good or bad answer. Change the task, and you have to change or retrain the evaluator.

The researchers who presented the MR3 model at the ICLR conference sought to address these two limitations.

Что такое MR3 и в чем ее особенность

What is MR3 and What Makes It Special

MR3 is a new type of evaluator model. Its full name stands for Multilingual Rubric-Agnostic Reward Reasoning Model, meaning it is a multilingual evaluation model that is not dependent on predefined criteria.

Let's break down what that means.

Multilingual. In terms of language coverage, MR3 surpasses anything that has existed in this field before. Simply put, the model can evaluate responses not only in English but also in dozens of other languages – which is crucial for systems that serve a multilingual audience.

Rubric-Agnostic. Most evaluators work based on a rubric: there is a list of rules, and the response is checked against each one. MR3 is designed differently – it can make an assessment based on the context of the task, without needing predefined rules about what is considered correct. This makes it more versatile: the same model can be applied in various scenarios without reconfiguration.

Reasoning as Part of the Evaluation. The word reasoning in its name isn't just for show. The model doesn't just output a score directly; it first constructs a chain of reasoning explaining why one answer is better than another and outlining its strengths and weaknesses. This makes the evaluation more transparent and, as a rule, more reliable.

Практическое применение MR3

Why Is This Needed – and For Whom?

To understand the practical value of MR3, it helps to recall how the process of improving language models works.

Modern large models are trained not only on text from the internet but also with the help of feedback, where the system learns from evaluations of its own responses. This approach is known as Reinforcement Learning from Human Feedback (RLHF) or its automated variations. The evaluator model acts as a judge here: it looks at a response and says how good it is.

If an evaluator works in only one language, the quality of fine-tuning in other languages inevitably suffers. This is particularly problematic for companies and teams building products for a multilingual audience.

Furthermore, if an evaluator is rigidly tied to specific criteria, it has to be retrained every time the task changes. MR3 removes this limitation, as it is capable of adapting to new evaluation conditions without retraining.

Значение MR3 для индустрии

What This Means for the Industry

The work on MR3 was presented at ICLR – one of the leading conferences in machine learning. This fact alone speaks to the approach's scientific validity.

For researchers and teams developing multilingual systems, MR3 offers an interesting alternative to current solutions. Instead of maintaining separate evaluators for different languages or tasks, they can use a single, more flexible model with broader coverage.

This is especially relevant as language models increasingly expand beyond the English language. The demand for quality assessment tools that work just as well in Spanish, Arabic, or Hindi as they do in English is very real and growing.

Вопросы и перспективы MR3

Open Questions

Like most research papers, the work on MR3 leaves some questions that have yet to be clarified in practice.

Being rubric-agnostic is one of the model's strengths, but it also creates an area of uncertainty. When an evaluator builds its own evaluation logic without relying on explicit rules, the question arises: how stable and predictable are its judgments in different contexts? Verifying this in real-world production scenarios is more difficult than on test datasets.

The quality of its multilingual performance is also not uniform: models generally perform better in languages with large amounts of training data. How consistently MR3 handles lower-resource languages is a question that requires separate study.

Nevertheless, the direction in which MR3 is moving seems logical: the quality evaluation of language models should be as flexible and multilingual as the models themselves. And in this regard, MR3 takes a significant step forward.

#technical context #research review #machine learning #ai development #ai training #ai linguistics #model benchmarks #large language model optimization

Link to Original: https://www.capitalone.com/site/tech/publications/mr3-reward-reasoning-models/

Original Title: MR3: Multilingual rubric-agnostic reward reasoning models

Publication Date: Apr 23, 2026

Capital One www.capitalone.com A U.S.-based financial technology corporation applying artificial intelligence and machine learning to banking services, data analytics, and financial process automation.

Previous Article When an AI Agent is Ready, But Needs a Proper Launch Next Article M4-RAG: When AI Seeks Answers in Images, Not Just Text, and Across Multiple Languages

MR3: модель для оценки ответов ИИ на десятках языков без заданных правил

Что такое MR3 и в чем ее особенность

Практическое применение MR3

Значение MR3 для индустрии

Вопросы и перспективы MR3

Related Publications

Как узнать, что нейросеть скоро «сломается», ещё до того, как это произойдёт

How to Cut Language Model Training Time by 25% Without Quality Loss

Robots That Remember: How Long- and Short-Term Memory Are Changing Robot Control

From Source to Analysis

Neural Networks Involved in the Process

1. Analyzing the Original Publication and Writing the Text

2. step.translate-en.title

3. Text Review and Editing

4. Preparing the Illustration Description

5. Creating the Illustration