Published on March 26, 2026

EgoVerse: Teaching Robots Human Movements Using First-Person Video

EgoVerse: How First-Person Video Teaches Robots Human Movements

EgoVerse is an open-source system for training robots using human first-person video, developed by a consortium of leading research teams.

Research 4 – 6 minutes min read
Event Source: Scale AI 4 – 6 minutes min read

One of the biggest challenges in robotics sounds simple: how do you teach a robot to do what a human does? Not in theory, but in practice – picking up an object, opening a door, or helping in the kitchen. There is a vast amount of data on human behavior in the world, but robots can hardly use it directly. Their bodies are different, their cameras are positioned differently, and their movements look dissimilar.

This is precisely the problem that the EgoVerse project aims to solve – an open research initiative created by a consortium involving Georgia Tech, Stanford, Meta, and several other teams.

Why First-Person Video Is Important for Robot Training

Why First-Person Video Is Important

When a person does something – cooking, assembling furniture, or arranging items – they see the world from a specific point of view: their own eyes. This is what's known as 'egocentric,' or first-person, video. Its distinguishing feature is that it doesn't show the action from an outside perspective, but rather how the person themselves perceives the space while performing the task.

This is crucial for robot training. Most robots also 'see' the world from a fixed point – from cameras built into their heads or arms. If they are trained on video shot from a similar angle, the data becomes much more applicable. Simply put, it's easier for a robot to acquire a skill if it has 'seen' it in the same way it perceives the world itself.

EgoVerse: An Open Foundation for Robotics Development

An Open 'Recipe,' Not a Proprietary Development

EgoVerse is positioned specifically as an open foundation – a 'recipe' that other teams can use, adapt, and build upon. This is a significant decision, as most major developments in robotics remain within labs or companies, inaccessible to the broader research community.

A different path was chosen here. The consortium is publishing not just its results, but its methodology: how to collect data, how to process it, and how to structure the skill transfer process from human to robot. This allows other teams to avoid starting from scratch and instead build on already proven approaches.

Scalable Robot Learning: The Core Idea Behind EgoVerse

Scalable Learning: What's the Idea?

The key word in the description of EgoVerse is 'scalability.' This means the system is designed to work not just within a single lab with one dataset, but to grow as the volume of information increases.

Traditionally, training robots has required vast amounts of manual labeling, specially designed scenarios, and costly experiments. EgoVerse proposes an approach where real-world human video – potentially millions of hours of footage – becomes usable for robot training without the need to recreate artificial conditions each time.

This doesn't mean everything is solved automatically, but it is a step toward narrowing the gap between 'human data' and 'robot data'.

Who Is Behind the EgoVerse Project?

Who Is Behind the Project?

The consortium that developed EgoVerse brings together several powerhouse research centers: Georgia Tech, Stanford, Meta, and other participants. This collaboration is significant in itself – robotics and transfer learning are becoming fields where it is increasingly difficult for individual teams to work in isolation.

These joint efforts make it possible not only to pool expertise but also to establish a common infrastructure: unified data formats, shared evaluation metrics, and compatible tools. This is something often lacking in academic robotics, where each lab tends to operate according to its own standards.

Practical Impact of EgoVerse on Robotics Development

What This Changes in Practice

If EgoVerse lives up to its ambitions, it could change how researchers approach the creation of general-purpose robots – those capable of performing a variety of tasks in a real home or on a factory floor, rather than only in strictly controlled environments.

Currently, most robots perform well in highly specialized scenarios: one task, one environment, and clearly defined parameters. If anything changes, the system often fails. Training on diverse, first-person data has the potential to make robot behavior more flexible and resilient to change.

At the same time, it is important to understand that EgoVerse is a foundation, not a finished product. It's a set of principles and methods that still need to be validated through widespread practice. The project's open nature is intended to facilitate this testing, allowing it to happen more quickly and under more varied conditions.

Open Questions and Challenges for EgoVerse

Open Questions Remain

Transferring skills from humans to robots is a challenge that researchers have been tackling for many years, and there is still no one-size-fits-all solution. Bodies are structured differently, degrees of freedom of movement differ, and the physics of interaction with objects behave dissimilarly for humans and robots.

EgoVerse is betting that a common viewing angle and proper data processing can partially bridge this gap. Whether this will work outside of laboratory settings – and, just as importantly, how the community will leverage the project's open-source materials – remains to be seen.

Original Title: EgoVerse: An open-source recipe for human-to-robot transfer
Publication Date: Mar 25, 2026
Scale AI scale.com A U.S.-based company providing labeled data and infrastructure for training AI models.
Previous Article Photon: AI Sees in Real Time, Latency-Free Next Article Google Opens Access to Lyria 3 – A Model That Composes Music From Text Prompts

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

We explore why assessing AI agents' skills isn't just a formality, but a crucial step toward building systems you can trust with real-world tasks.

OpenHandsopenhands.dev Mar 18, 2026

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.6 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.6 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Want to know about new
experiments first?

Subscribe to our Telegram channel — we share all the latest
and exciting updates from NeuraBooks.

Subscribe