Published on February 1, 2026

TTT-Discover: AI Learns and Improves During Testing

How to Teach AI to Discover New Things Right on the Dance Floor: Training Neural Networks During Testing

Researchers have taught a language model to find optimal solutions in science not through preliminary preparation, but by learning right in the process of working on a specific task.

Computer Science 14 – 21 minute min read

Author: Dr. Rafael Santos 14 – 21 minute min read

«When I was writing this article, one question wouldn't let me go: what if we've been looking for universality all this time when what we needed was specialization? Maybe the whole point isn't for the model to be able to do a little bit of everything, but for it to learn to be a genius at one thing–right now, right before your eyes? It's like the difference between someone who knows a thousand songs and someone who can improvise one–but in a way that gives you goosebumps. It's like a DJ who can play anything, but the true master is the one who creates a unique rhythm that captivates everyone here and now.» – Dr. Rafael Santos

Imagine a football player who learns to play not during practice, but right in the middle of the championship final. Sound crazy? That is exactly how TTT-Discover works–a method that allows large language models to learn and improve in real-time while solving a specific scientific problem. It's as if a musician were improvising on stage, becoming more virtuosic with every note, instead of just performing a memorized score.

TTT-Discover: AI Models Learn During Testing

🎭 When Rehearsal and Performance Are the Same Thing

Traditional approaches to artificial intelligence work like classical preparation for the Carnival in Rio: first months of training, memorizing movements, perfecting every step, and then–stepping onto the Sambadrome with a ready-made routine. Language models are trained on terabytes of text, their parameters are frozen, and then they are used to solve tasks. Everything is clear, everything is according to plan. But what if the task is so unique that no amount of preliminary preparation will help? What if you need to improvise?

This is where TTT-Discover enters the stage–Test-Time Training for Discovery. Imagine a percussionist who doesn't just play the drum but learns to feel the instrument better with every beat, adjusting the rhythm to the crowd's mood, experimenting with new patterns. The model doesn't just apply what it knows–it continues to learn, adapting to the specific problem right in the process of solving it.

Previous works, such as AlphaEvolve, used frozen models–that's like dancing samba to a strictly set choreography. TTT-Discover, however, applies reinforcement learning directly during testing. This means the model can change its internal parameters, learning from its mistakes and successes, getting better and better at solving this specific, concrete task.

How TTT-Discover Works: Learning and Discovery

🥁 The Rhythm of Search: How It Works

Let's figure out how this dance of learning and discovery is arranged. TTT-Discover consists of two main participants working in a pair, like a drummer and a dancer at the carnival.

The Generator–The One Who Sets the Rhythm

The first participant is the large language model, in this case, gpt-oss-120b. This is an open model, not some closed black box from corporate giants. The Generator is like an improviser at a jam session: it looks at the current situation, understands the context of the problem, and proposes new ideas. This could be Python code for an algorithmic task, a CUDA program for GPU kernel optimization, a mathematical formula, or a sequence for a cryptographic challenge.

The model doesn't just output one solution and stop–it generates a multitude of options, experiments, tries different approaches. It's like trying out different rhythmic patterns on a drum to see which one fits the melody best.

The Evaluator–The One Who Gives Scores

The second participant is the evaluator, the external environment that tells the model how good its proposal is. This could be a compiler checking if the code works, a GPU measuring the program's execution speed, a mathematical solver checking the correctness of a proof, or a biological analyzer assessing data processing quality.

The evaluator returns a numerical reward. It's like the audience's applause at a concert: the louder the ovation, the better the performance. This reward becomes the signal for the reinforcement learning algorithm, which corrects the model's behavior.

The Learning Loop: From the First Beat to the Finale

The entire process works cyclically, like a repeating refrain in a musical composition:

Generating Proposals: The model creates a set of possible solutions or modifications. These can be completely different ideas–from changing a single parameter to a complete overhaul of the approach. Variety is critical here, like the variety of moves in samba.
Evaluating Proposals: Each idea is sent for a check-up with the evaluator. For a GPU kernel, this would be execution time–the faster, the better. For a math problem–how close the solution is to a proof or an optimal result. Each solution is assigned a reward.
Updating the Model: Here is where the magic happens. Based on the rewards received, the model updates its internal weights using reinforcement learning algorithms like PPO (Proximal Policy Optimization). It's as if a dancer analyzed the crowd's reaction after every performance and adjusted their moves for the next time. Strategies that led to high rewards are amplified. Those that failed are weakened.
Iteration: The cycle repeats again and again. With every round, the model becomes more specialized in solving exactly this task, diving deeper into the solution space like an explorer in the Amazon jungle.

Key Features of TTT-Discover Methodology

⚡ TTT-Discover's Signature Moves

What makes this method truly special? Let's look at its key characteristics–those very elements that turn an ordinary algorithm into a virtuoso solo.

Single-Mindedness–One Shot, One Goal

Unlike traditional machine learning, where a model learns to be good “on average” across a multitude of tasks, TTT-Discover focuses on achieving excellence in one specific problem. It is not a universal soldier–it is a sniper. Instead of learning to play a hundred songs decently, the model learns to perform one composition perfectly.

This is a fundamentally different philosophy. When you train a model on a wide spectrum of data, you sacrifice depth for breadth. Here, we sacrifice breadth for maximum depth and specialization. The result? Solutions that surpass everything created by universal approaches.

Continuous Improvement–A Dance Without Stopping

The model never freezes. It constantly evolves, like a living organism. Every new attempt, every new reward is a chance to get better. It's like a football match where the team constantly adjusts tactics right during the game based on the opponent's actions.

In the traditional approach, after training, the model is fixed–its weights no longer change. Here, however, the weights continue to update, and the model continues to learn until it finds an optimal or near-optimal solution. It is a continuous process of perfection.

Balancing Exploration and Exploitation–Risk and Calculation

Reinforcement learning algorithms naturally balance between two strategies: exploration (venturing into new, uncharted territories) and exploitation (using already found promising paths). It's like the choice between trying a new drum pattern that might turn out to be genius (or a total flop) and refining a rhythm that already works.

Too much exploration–and you waste time on useless ideas. Too much exploitation–and you get stuck in a local optimum, missing out on truly breakthrough solutions. TTT-Discover finds the golden mean, directing the model's search into the most promising areas of the solution space.

Openness and Reproducibility–Music for Everyone

One of the most important aspects: all results were obtained using the open model gpt-oss-120b and publicly available code. This means any researcher can repeat the experiments, verify the results, and adapt the method for their own tasks. No closed corporate «black boxes», no secret sauces.

It's like publishing the sheet music of your composition so others can perform it, study it, and improve it. Science should be open, and TTT-Discover follows this principle.

TTT-Discover in Practice: Case Studies and Results

🏆 Wins on Different Dance Floors: Applying the Method

Theory is beautiful, but let's look at how TTT-Discover performed in practice. Researchers tested the method in four completely different fields–from abstract mathematics to applied biology. And the results? Impressive.

Mathematics: When Numbers Dance

The Erdős Discrepancy Problem

Imagine two binary sequences–strings of zeros and ones of the same length. Now shift one sequence relative to the other and count how many positions look the same. The task: find sequences such that for any shift, the number of matches is minimal. This is a classic combinatorics problem that mathematicians have battled over for decades.

TTT-Discover approached the problem like a dance: it generated various binary sequences, tried different construction strategies, and the external evaluator calculated the maximum discrepancy for each variant. The model learned to create sequences with fewer and fewer matches, just as a musician learns to avoid dissonance.

The result? New records for sequences of length 32 and 36 elements. The model outperformed not only exhaustive search results (which become practically impossible for such lengths) but also specialized algorithms developed by mathematicians. It's as if a jazz improviser created a melody that even academic composers couldn't write.

The Autocorrelation Inequality

This task is related to minimizing the autocorrelation of binary strings–a property critically important for cryptography and telecommunications. You need to find a binary sequence where the sum of the products of elements multiplied by themselves with a shift is minimal. The lower the autocorrelation, the better the sequence for use in secure communication systems.

TTT-Discover generated various binary strings, evaluated their autocorrelation properties, and learned to minimize the maximum value. The method managed to improve known bounds for several string lengths, which has direct practical value for cryptographic applications.

GPU Engineering: When Code Must Fly

GPU kernels are small programs that run on graphics processors and are critically important for everything: from video processing to training neural networks. Optimizing them is a true art requiring the deepest understanding of GPU architecture, memory management, and parallelism. It's like tuning a Formula 1 car: every microsecond counts.

At the GPUMode competition, participants were asked to optimize a specific computational kernel. TTT-Discover took on the task by generating different versions of CUDA code–the programming language for NVIDIA GPUs. The model experimented with:

Data memory layout (how to arrange information more effectively for quick access)
Register usage (ultra-fast memory inside the GPU)
Thread blocking (how to organize parallel calculations)
Synchronization strategies (how to coordinate the work of thousands of simultaneously running threads)
Other low-level parameters

The external evaluator–a real GPU–ran each version of the code and measured the execution time. The model learned from the results, gradually finding increasingly efficient combinations of optimizations.

The final result: a speedup of up to two times compared to previous best solutions. In the world of GPU optimization, where engineers fight for every percentage of performance, a two-fold speedup is like winning the World Cup. The model learned to use GPU resources as effectively as an experienced racer uses every bit of horsepower in their car.

Algorithms: Programming as an Art

AtCoder is a platform for competitive programming where participants solve complex algorithmic tasks in limited time. This is an intellectual sport of the major leagues: you need ingenuity, deep knowledge of data structures and algorithms, and the ability to write bug-free code under pressure.

Researchers tested TTT-Discover on tasks from past AtCoder contests. The model received the problem description, examples of input and output data, and then generated Python code. The evaluator ran this code on test datasets and returned the result: did the code pass all tests, did it stay within time and memory limits.

TTT-Discover managed to solve several complex tasks that usually only experienced programmers can handle. The model didn't just generate working code–it found efficient algorithmic solutions, optimized them, and accounted for edge cases. This shows that the method is capable not only of brute-force search but also of a kind of “algorithmic intuition”.

Biology: When Data Makes More Noise Than a Carnival

In single-cell biology, researchers study gene expression in individual cells. This allows us to understand how different cells perform different functions, how diseases arise, and how drugs work. The problem is that gene expression data often contains a huge amount of noise–technical measurement artifacts, biological variability, random fluctuations.

Imagine trying to hear a quiet melody of a flute in the middle of a Brazilian carnival. The task of de-noising is filtering the data to recover the true patterns of gene expression.

TTT-Discover was applied to generate and refine de-noising algorithms. The model studied the characteristics of noisy data and generated various approaches to filtering them: statistical methods, neural network filters, combinations of different techniques. The evaluator analyzed the quality of the restored data, comparing them with reference sets or using biologically significant metrics.

The result: improved quality of de-noising, which led to more accurate identification of cell types and expression patterns. This is critically important for medical research–from cancer diagnostics to developing personalized treatments. The cleaner the data, the more accurate the diagnosis, the more effective the therapy.

Accessibility and Cost-Effectiveness of TTT-Discover

💰 Accessibility: You Don't Need a Stadium, Just a Playground

One of the most impressive aspects of TTT-Discover is its economic accessibility. All experiments were conducted using Tinker–an API from Thinking Machines, costing only a few hundred dollars per task. This isn't millions of dollars for supercomputers, nor exclusive access to closed models from corporate giants.

A few hundred dollars represents the budget of a small research group, a university lab, or even a solo enthusiast. This is the democratization of scientific search based on AI. Previously, breakthrough results required huge computational resources and closed technologies. Now–a good idea, an open model, and a few hundred dollars are enough.

It's like the difference between organizing a Carnival in Rio with a million-dollar budget and throwing a vibrant party on a neighborhood playground. The scale is smaller, but the joy and energy are the same. And the results can be no less impressive.

Why TTT-Discover is a Breakthrough in AI

🌟 What Makes TTT-Discover Special

Let's summarize the intermediate results and look at the key differences of this approach from what existed before.

Quality Over Quantity

Traditional machine learning strives for good results “on average”. A model needs to work decently on thousands of different tasks. TTT-Discover flips this logic: the goal is not to be good everywhere, but to be excellent in one specific place.

This is like the difference between a good dancer in a club who can dance everything–from salsa to hip-hop at an acceptable level, and a professional samba dancer who has honed their art to perfection. The second one might not know how to waltz, but in samba, they have no equal.

Targeted Specialization

The model doesn't spread itself thin over generalization. It dives into the specifics of a particular problem, studies its nuances, adapts to its peculiarities. This is a deep immersion, like an archaeologist digging in one spot but reaching the most ancient layers.

In the process of TTT-Discover, the model can uncover patterns and strategies that are unique specifically to this task and might have been missed by a universal approach.

A Living, Evolving System

The model is not static. It lives, breathes, and develops right during the task solution. Every iteration is a step in evolution, every reward is a signal for adaptation. This is an organic process reminiscent of natural selection, but accelerated and directed.

Future Applications of TTT-Discover Technology

🔮 Where This Road Leads

The potential of TTT-Discover goes far beyond the successes already demonstrated. The method can be applied to the widest range of tasks where finding not just a good, but the best solution is required.

Materials Science

Imagine searching for new materials with specific properties–super-strong, super-light, superconducting. The model could generate molecular structures, and simulators would evaluate their physical characteristics. TTT-Discover could accelerate the discovery of materials of the future.

Drug Development

Searching for new candidate molecules to treat diseases is a task with a huge search space. The model could generate structures of potential drugs, while biological simulators evaluate their efficacy and safety. This could cut the time for developing new drugs from decades to years.

Logistics and Optimization

Optimizing delivery routes, managing warehouse stocks, planning production–all these tasks require finding the best solutions in complex conditions with many constraints. TTT-Discover could find solutions that save millions of dollars and tons of fuel.

Mathematical Proofs

Can a model not just find numerical solutions but also generate formal mathematical proofs? This is one of the most ambitious goals. If TTT-Discover can handle this, it will open a new era in mathematics where AI becomes a partner to mathematicians in the search for proofs.

The Philosophy and Impact of TTT-Discover for AI Research

🎸 The Final Chord

TTT-Discover shows us a new path for using artificial intelligence in science. It's not about making a machine follow instructions faster than a human. It's about giving the machine the opportunity to learn to discover, explore, and improve right in the process of solving a task.

Traditional approaches work like a recording studio: everything is thought out, rehearsed, recorded. TTT-Discover is a live concert where every performance is unique, where musicians improvise, adapt, and react to the energy of the hall. And it is in such moments that true magic is born.

The method demonstrates that breakthrough results don't require astronomical budgets and closed technologies. Open models, public code, a few hundred dollars–and you can solve tasks that have baffled specialists for years. This is the democratization of scientific search, making cutting-edge research accessible to a much wider circle of people.

But the most important thing is the philosophy of the approach. Instead of trying to create a universal tool for everything, TTT-Discover focuses on achieving perfection in one thing. This is a lesson not only for machine learning but also for life: sometimes it is better to be a master of one trade than a mediocrity in a hundred.

Algorithms aren't better than us–they are just different. And when we learn to use their strengths correctly, when we allow them to learn and develop right in the process of solving a task, we open doors to solutions that seemed unreachable. This is a dance between human ingenuity and machine adaptability, between algorithmic precision and creative search.

And this dance is only just beginning. 🎉

#technical context #methodology #machine learning #ai development #ai training #mathematics #biology #open language models #scientific ai

Source: https://arxiv.org/abs/2601.16175v1

Original Title: Learning to Discover at Test Time

Article Publication Date: Jan 22, 2026

Original Article Authors : Mert Yuksekgonul, Daniel Koceja, Xinhao Li, Federico Bianchi, Jed McCaleb, Xiaolong Wang, Jan Kautz, Yejin Choi, James Zou, Carlos Guestrin, Yu Sun

Dr. Rafael Santos View Profile

«Algorithms aren't better than us – they're just different.»

View Profile

I'm a programmer who sees AI not as a threat, but as a tool for creativity. I love showing how computers “think” through the lens of music and football.

Previous Article Genomic Mosaic: Teaching a Computer to Discern Kindred Spirits Among Billions of Letters of Life Next Article Symphony of Determinants: How Matrix Integrals Unlock the Mysteries of the Riemann Zeta Function

TTT-Discover: AI Learns and Improves During Testing

TTT-Discover: AI Models Learn During Testing

How TTT-Discover Works: Learning and Discovery

The Generator–The One Who Sets the Rhythm

The Evaluator–The One Who Gives Scores

The Learning Loop: From the First Beat to the Finale

Key Features of TTT-Discover Methodology

Single-Mindedness–One Shot, One Goal

Continuous Improvement–A Dance Without Stopping

Balancing Exploration and Exploitation–Risk and Calculation

Openness and Reproducibility–Music for Everyone

TTT-Discover in Practice: Case Studies and Results

Mathematics: When Numbers Dance

GPU Engineering: When Code Must Fly

Algorithms: Programming as an Art

Biology: When Data Makes More Noise Than a Carnival

Accessibility and Cost-Effectiveness of TTT-Discover

Why TTT-Discover is a Breakthrough in AI

Quality Over Quantity

Targeted Specialization

A Living, Evolving System

Future Applications of TTT-Discover Technology

Materials Science

Drug Development

Logistics and Optimization

Mathematical Proofs

The Philosophy and Impact of TTT-Discover for AI Research

Related Publications

Theorizer: How AI Learns to Formulate Scientific Laws from Thousands of Papers

How Agentic Models Are Trained After Base Training

Generalizing Generalization: When Neural Networks Learn to Predict – But Not What We Expected

From Research to Understanding

Neural Networks Involved in the Process

1. Research Summarization

2. Creating Text from Summary

3. step.translate-en.title

4. Editorial Review

5. Preparing Description for Illustration

6. Creating Illustration