Published on March 5, 2026

Teaching a Compact Computer to Control a Robot: A Case Study in On-Device AI

NXP and Hugging Face explain how to train robotic artificial intelligence on custom data and run it on a low-power embedded device.

Development 6 – 8 minutes min read
Event Source: Hugging Face 6 – 8 minutes min read

When we think of AI-powered robots, we usually picture something large, connected to a powerful server somewhere in the cloud. But what if a robot needs to operate autonomously – without a constant internet connection, without a powerful GPU nearby, running directly 'onboard'? This is precisely the challenge that engineers at NXP and Hugging Face tackled, detailing their results in an in-depth technical blog post.

This isn't just an abstract experiment. It's a practical guide on how to take a modern AI model for robot control, train it on your own data, and run it on a small embedded device – one that can fit inside the robot's actual chassis.

What is VLA and Why is it Needed for Robot Control?

What is VLA and Why is it Needed?

To understand what this is all about, we need to break down the term. VLA stands for Vision-Language-Action – in other words, 'vision, language, action.' Simply put, it's a type of AI model that can perceive an image from a camera, understand a text command, and, based on that, make a decision about a physical action – for example, where to move a robotic arm or how to pick up an object.

To put it very simply: you tell the robot, 'pick up the red block,' it looks around with its camera, finds the block, and picks it up. The model simultaneously 'sees,' 'understands,' and 'acts' – hence the name.

Such models already exist and show impressive results in laboratory settings. The problem is that they typically require significant computational resources. Running them on a small embedded chip is another story entirely.

Collecting Data for AI Robot Control by Hand

Collecting Data by Hand

Every AI model learns from data. For robotic systems, this means recorded examples of the robot performing tasks: camera footage, joint positions, and control commands. The more numerous and diverse the examples, the better the model understands what is required of it.

In the project described, a custom dataset recorded by hand was used. An operator controlled a robotic manipulator, demonstrating the desired behavior, while the system recorded everything. This approach is called learning from demonstration – the model watches how a human performs the task and learns to replicate that behavior.

An important point: the data was recorded in a standardized format compatible with the Hugging Face ecosystem. This means it can be reused, shared with the community, and applied with other tools without needing additional conversion.

Fine-Tuning a Pre-trained AI Model for Robotic Tasks

Fine-Tuning: When a Pre-trained Model is Just the Beginning

Taking a model 'from scratch' and training it entirely on your own is expensive and time-consuming. That's why the project used an approach called fine-tuning: taking an already trained model that has general capabilities and 'sharpening' it for a specific task and a specific robot.

It's similar to how an experienced chef, who knows how to cook a wide variety of dishes, works in a specific restaurant for a few weeks to get accustomed to its menu, equipment, and presentation style. They already have the basic skills – they're just adapting.

In this case, they started with the SmolVLM model – a compact multimodal model from Hugging Face that can work with images and text. It was fine-tuned on their custom recorded data, adding a 'head' to predict the robot's actions. The result was a model that understands natural language commands, analyzes the camera image, and outputs control signals for the manipulator.

The Challenge of Fitting AI Models on Tiny Chips

The Hardest Part – Fitting it on a Tiny Chip

This is where things get really interesting from an engineering perspective. Even a VLA that is compact by large model standards still puts a serious load on an embedded device. Smartphones, and especially specialized robotics boards, are far less powerful than cloud servers.

To get the model running on the target platform – the NXP i.MX 95 processor – it had to be significantly optimized. Several techniques were used:

  • Quantization – simplifying the numerical values within the model. Roughly speaking, instead of very precise numbers, rounded values are used, which reduces the model's size and speeds up calculations with minimal loss of quality.
  • Hardware-specific compilation – the model is converted into a format optimized specifically for the architecture of the chip being used, allowing it to perform calculations as efficiently as possible.

As a result, they succeeded in running the model directly on the device – without the cloud, without an external server. The robot receives a command, processes the image, and makes its decision locally.

Why On-Device AI Processing is Critical for Robots

Why 'On-Device' Matters

The question may arise: why go to all this trouble? After all, you could just send data to the cloud and get a response from there.

There are several reasons. First, latency. For robots, especially those operating in real-time, even a few dozen milliseconds of delay can be critical. Local processing is faster.

Second, reliability. A robot in a factory or out in the field doesn't always have a stable network connection. If the intelligence is right on board, a loss of connectivity doesn't halt its operation.

Third, privacy and security. Data from cameras and sensors doesn't leave for external servers – it's processed locally.

This is especially relevant for industrial robotics, autonomous vehicles, medical devices, and other fields where reliability and autonomy are not just conveniences, but requirements.

An Open Approach to Replicate On-Device AI for Robotics

An Open Approach: You Can Replicate It

One of the notable aspects of this project is its openness. The authors didn't just share their results; they described the entire process: how the data was recorded, how the fine-tuning was performed, and what optimizations were applied and why.

The tools and data formats used are based on open standards from the Hugging Face ecosystem. This means that a team working on their own robot can use this experience as a foundation, without reinventing the wheel. Recording your own demonstrations, fine-tuning the model, optimizing for your hardware – the entire pathway is now documented.

This isn't a revolution, but it is a significant practical contribution: previously, this kind of knowledge was concentrated in the closed labs of large corporations, and now it's becoming more accessible.

Applications and Future of Embedded AI for Robots and Autonomous Devices

Where This Can All Be Useful

Embedded AI for robots isn't just about industrial manipulators. We're talking about a wide range of devices: assistant robots, autonomous drones, maintenance systems, logistics and warehouse robots, and educational platforms.

In all these cases, there's a common requirement: the device must operate autonomously, react quickly, and not depend on a constant server connection. This is precisely what the described project demonstrates.

Of course, for now, we are talking about relatively simple tasks – grasping and moving objects in a controlled environment. We're still a long way from a fully autonomous robot capable of handling unpredictable environments. But the direction is clearly set: compact, autonomous, trained on real-world data – all on a device the size of a small board.

Original Title: Bringing Robotics AI to Embedded Platforms: Dataset Recording, VLA Fine‑Tuning, and On‑Device Optimizations
Publication Date: Mar 5, 2026
Hugging Face huggingface.co A U.S.-based open platform and company for hosting, training, and sharing AI models.
Previous Article SysOM MCP: When AI Figures Out What's Wrong with Your Server Next Article DeepSpeed Learns to Train Complex AI Models More Efficiently: What's Changed and Why It Matters

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.6 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.6 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Don’t miss a single experiment!

Subscribe to our Telegram channel —
we regularly post announcements of new books, articles, and interviews.

Subscribe