Published March 4, 2026

How to Train an Image Generation Model in 24 Hours: The Photoroom Team's Experience

The Photoroom team shares how they managed to train their own image generation model in just 24 hours and what the results were.

Development
Event Source: Hugging Face Reading Time: 5 – 8 minutes

When most people hear the phrase «AI model training», they picture something massive: months of computations, huge datasets, and dozens of engineers. This is partly true, especially for the industry's biggest players. But the Photoroom team decided to see just how far they could get in 24 hours. The result was interesting enough to warrant a detailed article.

What is PRX and Why Does It Matter

What is PRX and Why Does It Matter?

Photoroom is an image editing service primarily focused on e-commerce, offering features like background removal, environment replacement, and product photo creation. For such a product, the quality of image generation isn't just an academic interest; it's a direct business necessity.

PRX is the internal name for their series of experiments with custom text-to-image generation models. The third part of this series was dedicated to a specific challenge: Is it possible to complete the entire model training cycle – from data preparation to a working result – in just one day?

This wasn't a competition for competition's sake. Behind this limitation lies a very practical question: How quickly can a small team iterate, test hypotheses, and make progress without spending weeks on each experiment?

24 Hours: A Serious Constraint

When people talk about training an image generation model, they usually mean a process that takes days, if not weeks, on modern hardware. In this case, the team set a strict limitation for themselves: one day from start to finish.

To meet this deadline, they had to make specific decisions at every stage: what volume of data to use, how to organize the training process, and at what point to consider the result «good enough» for the next step. Simply put, this was an exercise not just in technology, but also in prioritization.

An important point: the team didn't build the model from scratch. They started with an existing foundation – a pre-trained model that they then adapted for their needs. This is standard industry practice: take something that already knows how to «understand» images and text, and then continue training it on specific data. This approach is called fine-tuning; it's like retraining a specialist for a specific job instead of teaching them from the ground up.

What Was the Desired Outcome

What Was the Desired Outcome?

The goal wasn't just to «train something», but to get a model that could handle tasks specifically important for Photoroom. This primarily meant high-quality generation of product images: items on a clean background, realistic e-commerce scenes, and adherence to text descriptions.

This distinguishes the experiment from typical academic benchmarks, where a model is evaluated on a wide range of abstract tasks. Here, the metric for success was practical: How well does the model perform the tasks that users of the service actually need?

Data Is Half the Battle

One of the key takeaways that runs through the publication is that the quality of the training data is just as important as the model's architecture or computational resources.

The team paid considerable attention to preparing the dataset – selecting, filtering, and labeling images. This is labor-intensive work that's easy to underestimate if you only focus on the «glamorous» part – the model itself. But this is precisely where the foundation is laid for what the model will ultimately learn.

In short: garbage in, garbage out. This rule applies just as strictly in AI as it does in any other engineering discipline. Therefore, a significant portion of the 24-hour marathon was spent not on the training itself, but on ensuring the data was in order.

What Were the Final Results

What Were the Final Results?

The experiment's results were promising – with some caveats. The model did learn to generate images that matched descriptions and handled Photoroom's core tasks at an acceptable level.

At the same time, the team is honest about the limitations: it's impossible to create a model in 24 hours that can compete with flagship solutions developed over years. But that wasn't the goal. The goal was to test whether the approach itself works as a method for rapid iteration.

And in this regard, the answer was positive. The team not only managed to get a working model but also gathered specific insights into what affects quality, where the bottlenecks are, and what should be improved in the next cycle.

Why Is This Interesting for More Than Just Photoroom

Why Is This Interesting for More Than Just Photoroom?

The story of «we trained a model in a day» might sound like a marketing slogan. But behind it lies a broader observation relevant to the entire industry.

Previously, the ability to train custom image generation models was concentrated in the hands of a few large companies with vast resources. Gradually, the barrier to entry is lowering: tools are becoming more accessible, methods more efficient, and accumulated knowledge is spreading faster.

Photoroom's experiment is one example of how a relatively small team can meaningfully work with technology that, until recently, seemed exclusive to the biggest players. This doesn't mean resources are no longer important – they are. But the gap is narrowing.

Furthermore, the public description of such an experiment – with its specific decisions, observations, and honest limitations – is valuable to the community in its own right. Not every team is willing to share not only their successes but also what didn't work or required compromises.

Open Questions

Several things in this story remain behind the scenes or require a cautious interpretation.

First, a «working model in 24 hours» is still an experiment within a specific context: a particular team, specific data, and certain hardware. Replicating this result in a different context is not a trivial task.

Second, the model's quality was evaluated based on Photoroom's internal criteria. How these results compare to broader benchmarks is an open question.

Third, the question of cost remains behind the scenes: How many computational resources were used during those 24 hours? The time limit is one metric, but the financial aspect of the experiment is not detailed in the publication.

All of this doesn't diminish the experiment's value, but it's worth keeping in mind when interpreting the conclusions.

If you're interested in rapid iteration in model training or the specifics of working with images for e-commerce, Photoroom's publication is worth a read. It has enough concrete details to offer something useful, even if you don't plan to replicate the experiment exactly.

Original Title: PRX Part 3 – Training a Text-to-Image Model in 24h!
Publication Date: Mar 3, 2026
Hugging Face huggingface.co A U.S.-based open platform and company for hosting, training, and sharing AI models.
Previous Article OpenHands Can Now Autonomously Find and Fix Code Vulnerabilities Next Article Mistral Document AI in Microsoft Foundry: Implications for Document Processing

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.6 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.6 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

Want to know about new
experiments first?

Subscribe to our Telegram channel — we share all the latest
and exciting updates from NeuraBooks.

Subscribe