Published on March 4, 2026

How to Train an Image Generation Model in 24 Hours: The Photoroom Team's Experience

The Photoroom team shares how they managed to train their own image generation model in just 24 hours and what the results were.

Development 5 – 8 minutes min read

Event Source: Hugging Face 5 – 8 minutes min read

When most people hear the phrase «AI model training», they picture something massive: months of computations, huge datasets, and dozens of engineers. This is partly true, especially for the industry's biggest players. But the Photoroom team decided to see just how far they could get in 24 hours. The result was interesting enough to warrant a detailed article.

What is PRX and Why Does It Matter

What is PRX and Why Does It Matter?

Photoroom is an image editing service primarily focused on e-commerce, offering features like background removal, environment replacement, and product photo creation. For such a product, the quality of image generation isn't just an academic interest; it's a direct business necessity.

PRX is the internal name for their series of experiments with custom text-to-image generation models. The third part of this series was dedicated to a specific challenge: Is it possible to complete the entire model training cycle – from data preparation to a working result – in just one day?

This wasn't a competition for competition's sake. Behind this limitation lies a very practical question: How quickly can a small team iterate, test hypotheses, and make progress without spending weeks on each experiment?

24 Hours: A Serious Constraint

When people talk about training an image generation model, they usually mean a process that takes days, if not weeks, on modern hardware. In this case, the team set a strict limitation for themselves: one day from start to finish.

To meet this deadline, they had to make specific decisions at every stage: what volume of data to use, how to organize the training process, and at what point to consider the result «good enough» for the next step. Simply put, this was an exercise not just in technology, but also in prioritization.

An important point: the team didn't build the model from scratch. They started with an existing foundation – a pre-trained model that they then adapted for their needs. This is standard industry practice: take something that already knows how to «understand» images and text, and then continue training it on specific data. This approach is called fine-tuning; it's like retraining a specialist for a specific job instead of teaching them from the ground up.

What Was the Desired Outcome

What Was the Desired Outcome?

The goal wasn't just to «train something», but to get a model that could handle tasks specifically important for Photoroom. This primarily meant high-quality generation of product images: items on a clean background, realistic e-commerce scenes, and adherence to text descriptions.

This distinguishes the experiment from typical academic benchmarks, where a model is evaluated on a wide range of abstract tasks. Here, the metric for success was practical: How well does the model perform the tasks that users of the service actually need?

Data Is Half the Battle

One of the key takeaways that runs through the publication is that the quality of the training data is just as important as the model's architecture or computational resources.

The team paid considerable attention to preparing the dataset – selecting, filtering, and labeling images. This is labor-intensive work that's easy to underestimate if you only focus on the «glamorous» part – the model itself. But this is precisely where the foundation is laid for what the model will ultimately learn.

In short: garbage in, garbage out. This rule applies just as strictly in AI as it does in any other engineering discipline. Therefore, a significant portion of the 24-hour marathon was spent not on the training itself, but on ensuring the data was in order.

What Were the Final Results

What Were the Final Results?

The experiment's results were promising – with some caveats. The model did learn to generate images that matched descriptions and handled Photoroom's core tasks at an acceptable level.

At the same time, the team is honest about the limitations: it's impossible to create a model in 24 hours that can compete with flagship solutions developed over years. But that wasn't the goal. The goal was to test whether the approach itself works as a method for rapid iteration.

And in this regard, the answer was positive. The team not only managed to get a working model but also gathered specific insights into what affects quality, where the bottlenecks are, and what should be improved in the next cycle.

Why Is This Interesting for More Than Just Photoroom

Why Is This Interesting for More Than Just Photoroom?

The story of «we trained a model in a day» might sound like a marketing slogan. But behind it lies a broader observation relevant to the entire industry.

Previously, the ability to train custom image generation models was concentrated in the hands of a few large companies with vast resources. Gradually, the barrier to entry is lowering: tools are becoming more accessible, methods more efficient, and accumulated knowledge is spreading faster.

Photoroom's experiment is one example of how a relatively small team can meaningfully work with technology that, until recently, seemed exclusive to the biggest players. This doesn't mean resources are no longer important – they are. But the gap is narrowing.

Furthermore, the public description of such an experiment – with its specific decisions, observations, and honest limitations – is valuable to the community in its own right. Not every team is willing to share not only their successes but also what didn't work or required compromises.

Open Questions

Several things in this story remain behind the scenes or require a cautious interpretation.

First, a «working model in 24 hours» is still an experiment within a specific context: a particular team, specific data, and certain hardware. Replicating this result in a different context is not a trivial task.

Second, the model's quality was evaluated based on Photoroom's internal criteria. How these results compare to broader benchmarks is an open question.

Third, the question of cost remains behind the scenes: How many computational resources were used during those 24 hours? The time limit is one metric, but the financial aspect of the experiment is not detailed in the publication.

All of this doesn't diminish the experiment's value, but it's worth keeping in mind when interpreting the conclusions.

If you're interested in rapid iteration in model training or the specifics of working with images for e-commerce, Photoroom's publication is worth a read. It has enough concrete details to offer something useful, even if you don't plan to replicate the experiment exactly.

#applied analysis #methodology #neural networks #machine learning #ai training #engineering #data #ai image editing #model training optimization

Link to Original: https://huggingface.co/blog/Photoroom/prx-part3

Original Title: PRX Part 3 – Training a Text-to-Image Model in 24h!

Publication Date: Mar 3, 2026

Hugging Face huggingface.co A U.S.-based open platform and company for hosting, training, and sharing AI models.

Previous Article OpenHands Can Now Autonomously Find and Fix Code Vulnerabilities Next Article Mistral Document AI in Microsoft Foundry: Implications for Document Processing

How to Train an Image Generation Model in 24 Hours: The Photoroom Team's Experience

What is PRX and Why Does It Matter

24 Hours: A Serious Constraint

What Was the Desired Outcome

Data Is Half the Battle

What Were the Final Results

Why Is This Interesting for More Than Just Photoroom

Open Questions

Related Publications

What Affects Text-to-Image Model Quality: PhotoRoom's Research on Important Training Details

How to Make Small Language Models Think Better: AMD's Experience with Synthetic Data

Tencent Hunyuan Reveals How to Pinpoint Bottlenecks in Language Model Training

From Source to Analysis

Neural Networks Involved in the Process

1. Analyzing the Original Publication and Writing the Text

2. step.translate-en.title

3. Text Review and Editing

4. Preparing the Illustration Description

5. Creating the Illustration