NVIDIA Research Advances Robotics From Simulation to the Real World


Robotics is entering a new phase: moving from controlled demos and scripted automation toward generalizable, reliable embodied autonomy in the real world. 

At the International Conference on Robotics and Automation (ICRA), eight of NVIDIA Research’s 28 accepted papers show how simulation-to-real transfer is becoming a foundation for that shift, helping robots perceive, reason, plan and act across dynamic, unpredictable environments.

Together, the papers span the full stack of challenges robot developers face: coordinating multiple arms in parallel, building policies that generalize across robot bodies, grasping novel objects in clutter, performing precise assembly and developing vision-language-action models that reason before they move. 

The throughline is clear: sim-to-real is becoming a foundation for robots that can adapt, generalize, and operate with greater reliability outside the lab.

Coordinating Arms, Navigating Bodies, Grasping Objects

Picture a pharmaceutical lab run by robotic arms: picking up tubes, transferring liquids, mixing reagents — each step taking different amounts of time, all requiring careful coordination. 

Traditional robot scheduling software handles those steps sequentially, one arm at a time. 

ScheduleStream changes that by running computations on GPUs, letting multiple arms plan movements and operate in parallel. The result — a 3x speedup across multi-arm planning scenarios, on hardware like the NVIDIA Jetson edge AI platform. Code for the framework is available on GitHub.

 

A robot that learns to navigate through a space — avoiding obstacles and finding its destination — usually learns to do it in one body. Put the same navigation software into a differently shaped robot and it often falls apart, because its parts all move differently. 

The COMPASS policy framework solves this by first building the baseline navigation functionality using imitation learning and then using residual reinforcement learning in NVIDIA Isaac Lab to build specialists for diverse robot embodiments. Crucially, no real-world robot data is involved at any stage: everything is trained in Isaac Lab simulation. 

Compared with an imitation learning baseline, COMPASS achieved a 4.5x improvement in average success rate. It also seamlessly transfers to real-world environments, demonstrating around 80% success across 20 real-world navigation trials on autonomous mobile robots and humanoids. 

COMPASS is agent-friendly, with dedicated skills — and developers can connect the pipeline with NVIDIA Omniverse NuRec to post-train and validate robots in a digital twin of a novel environment before deployment. 

Most grasping systems identify the object, predict a grasp, plan a path, then execute. But the last few centimeters are where small errors matter most.

Grasp-MPC adaptively computes robotic grasps, continuously correcting the robot’s motion as it closes in on the object, rather than carrying out a fixed plan — the way a person grabs something by feeling rather than calculating every joint angle in advance.

To build the policy, the researchers generated 2 million simulated trajectories across 8,000 objects using annotations from the GraspGen dataset and motion planning data from cuRobo, a CUDA-accelerated library for robot motion generation. 

After training on both successful and failed trajectories, Grasp-MPC learned to grasp novel objects in cluttered tabletops and shelves — achieving around 75% overall success on real robots, compared with a baseline of 41%.

 

Deformable Cluster Manipulation introduces a framework that tackles a parallel challenge: enabling systems to grasp not just one object, but a whole bundle of flexible, tangled material at once. 

The framework was motivated by a real-world task: clearing a mass of tree branches that have grown over a power line, where there’s no single clean object to grab. The system uses its entire arm, not just the gripper: wrapping it around the branch cluster and sweeping it aside, the way someone might gather an armful of cables or push a tangle of brush out of the way. 

The researchers built a tree generator using biological growth equations to create synthetic trees of many different shapes and sizes — then trained the system across thousands of them in NVIDIA Isaac open simulation frameworks. 

The policy deploys to real branches zero shot. Beyond power lines, the researchers see potential in cable management, agricultural inspection and anywhere robots need to handle a tangle rather than a single graspable item.

Clearing tree branches in zero-shot sim-to-real deployment.

Assembling With Precision

Precise assembly — threading a nut onto a bolt, inserting a gear onto a gearshaft, pressing a peg into a hole — is notoriously hard to get right with simulation alone. 

The real world is complex. Real surfaces aren’t perfectly smooth. Sensors don’t behave as specified. Tiny discrepancies that a simulator ignores can stop a robot in its tracks.

The SPARR method addresses this by splitting the job in two. A policy trained in Isaac Lab learns the general strategy for the assembly task in simulation. Then, on the actual hardware, a second layer learns to correct for whatever the simulator got wrong — using the robot’s own camera and without any human demonstrations or guidance. 

SPARR improves success rates by 38% and reduces cycle time by around 30% compared with zero-shot sim-to-real baselines. 

On National Institute of Standards and Technology (NIST) assembly tasks not seen during training, success improves by nearly 75% — approaching the results of methods that require a human in the loop.

The Refinery framework takes on the next layer of difficulty in assembly: tasks with multiple sequential steps, where how step one is finished determines whether step two is even possible. It’s like assembling furniture — leave a panel at the wrong angle, and the next fastener won’t go in. 

By understanding how success varies across initial conditions and training across hundreds of simulated assembly scenarios, Refinery learns how to complete each step and leave each component in a position that sets up the next. It achieves 91% simulation success and a nearly 11% mean improvement over baselines with comparable real-world results — and its policies can be chained to handle long, multi-part sequences.

Action Models That Keep Their Word

The PEEK pipeline helps robots see past the clutter. In a typical manipulation task, the robot’s camera picks up everything in the scene — but most of it is irrelevant noise. 

One task demonstrated on the PEEK project page is “give the banana to NVIDIA founder and CEO Jensen Huang”: a photo of Huang sits on a table alongside a photo of Michael Jordan, a collection of unrelated objects and other distractors. 

A human doing the task instantly focuses on the banana and the right photo; a standard robot policy has to process everything and often gets confused. PEEK solves this by having a vision language model read the task instruction and focus the robot’s line of vision accordingly — showing a movement path, and highlighting around the objects that matter, while fading out everything else. 

The policy then acts on that annotated view rather than the raw scene. For a policy trained purely in simulation, adding PEEK produced a 41x real-world improvement in accuracy. For large VLA models and smaller policies, gains range from 2-3.5x. Because it works at the image level, PEEK integrates with any camera-based policy without modification.

 

Do What You Say — a collaboration with researchers at Carnegie Mellon University, University of Utah and University of Sydney — addresses a specific failure mode that matters more as robots tackle longer, more complex tasks. 

Give a robot an instruction like “store everything on this table inside the cabinet” or “prepare a Manhattan,” and it has to break that down into individual steps and execute them in sequence. 

The problem is that the AI model can correctly reason through what it needs to do — and then execute something different. 

The method, called SEAL, fixes this at runtime without any retraining: the robot generates several candidate action sequences, thinks through where each one would actually lead and picks the outcome that matches what it said it would do. SEAL delivers up to 15% accuracy gains over prior work, with robustness against rephrased instructions, changed objects, scene clutter and shifted camera angles.

 

In addition to papers, NVIDIA is expanding robotics research infrastructure with large-scale open datasets for robotics. The NVIDIA Physical AI Dataset is the world’s largest open dataset for physical development, surpassing 15 million+ downloads, while NVIDIA Isaac GR00T X Embodiment Sim has become one of the most-downloaded robotics datasets.  

Universities Accelerate Physical AI Research With NVIDIA Technologies

Robotics teams from universities such as Carnegie Mellon University (CMU), ETH Zurich, MIT and University of Texas at Austin are tapping NVIDIA technologies to move physical AI research from simulation to real-world systems — with nearly 50 accepted papers referencing NVIDIA-accelerated simulation, robot learning and compute.

Examples include a paper from CMU demonstrating a robotic control framework trained in NVIDIA Isaac Lab and MIT work on large language model-guided reinforcement learning powered by NVIDIA GPUs.

Explore NVIDIA Research’s physical AI work. Developers can get started with Isaac Lab and Isaac Sim.

Stay up to date by subscribing to our newsletter, and following NVIDIA Robotics on LinkedIn, Instagram, X and Facebook.

To start your robotics journey, enroll in our free NVIDIA Robotics Fundamentals courses today.



NVIDIA GTC Showcases Virtual Worlds Powering the Physical AI Era



Editor’s note: This post is part of Into the Omniverse, a series focused on how developers, 3D practitioners, and enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse.

NVIDIA GTC last week showcased a turning point in physical AI: Robots, vehicles and factories are scaling from single use cases and isolated deployments to sophisticated enterprise workloads across industries. 

At the center of this shift are new frontier models for physical AI, including NVIDIA Cosmos 3, NVIDIA Isaac GR00T N1.7 and NVIDIA Alpamayo 1.5. 

NVIDIA also released the NVIDIA Physical AI Data Factory Blueprint, designed to push the state of the art in world modeling, humanoid skills and autonomous driving, as well as the NVIDIA Omniverse DSX Blueprint for AI factory digital twin simulation.

Open source agentic frameworks such as OpenClaw extend the AI stack all the way to operations — enabling long‑running “claws” that use tools, memory and messaging interfaces to orchestrate workflows, manage data pipelines and execute tasks autonomously on dedicated machines. 

“With NVIDIA and the broader ecosystem, we’re building the claws and guardrails that let anyone create powerful, secure AI assistants,” said Peter Steinberger, creator of OpenClaw, in an NVIDIA press release from GTC. 

OpenUSD is a driving force behind the scalability of physical AI — providing a common, scene‑description language that lets teams bring computer-aided design (CAD) data, simulation assets and real‑world telemetry into a shared, physically accurate view of the world. 

Simulating the AI Factory Before It’s Built

Modern AI factories are complex — spanning thermals, power grids, network load and mechanical systems. Building them on time and on budget becomes much easier when using simulation technology. 

To tackle this, NVIDIA introduced the Omniverse DSX Blueprint at GTC, a reference architecture that unifies simulation across every layer of an AI factory through a single digital twin. This enables operators to optimize performance and efficiency before a rack is installed in the real world.

Compute Is Data: Real-World Data Is No Longer the Moat

Real-world data used to function as a moat for physical AI — but it doesn’t scale. The real world is messy, unpredictable and full of edge cases, and the pipelines to process, simulate and evaluate data are fragmented. The bottleneck isn’t just data — it’s the entire data factory.

To help address this, NVIDIA introduced at GTC its Physical AI Data Factory Blueprint, an open reference architecture that transforms compute into large-scale, high-quality training data. Built on NVIDIA Cosmos open world foundation models and the NVIDIA OSMO operator, it unifies data curation, augmentation and evaluation into a single pipeline, enabling developers to generate diverse, long-tail datasets from limited real-world inputs.

Leading physical AI developers including FieldAI, Hexagon Robotics, Linker Vision, Milestone Systems, Skild AI and Teradyne Robotics are already tapping the blueprint to speed up robotics projects, vision AI agents and autonomous vehicle programs.

Microsoft Azure and Nebius are the first cloud platforms to offer the blueprint, turning world-scale compute into turnkey data production engines.

“Together with cloud leaders, we’re providing a new kind of agentic engine that transforms compute into the high-quality data required to bring the next generation of autonomous systems and robots to life,” said Rev Lebaredian, vice president of Omniverse and simulation technologies at NVIDIA, in this press release. “In this new era, compute is data.”

From OpenUSD to Reality: Seamless Design to Deployment

Converting CAD files to OpenUSD is a critical step in the physical AI pipeline — transforming engineering data into simulation-ready assets that developers can use to build, test and validate robots in physically accurate virtual environments. 

Using tools like the NVIDIA Omniverse Kit software development kit and NVIDIA Isaac Sim, teams can optimize and enrich 3D data for real-time rendering, simulation and collaborative workflows.  

Companies including FANUC and Fauna Robotics are using this seamless CAD-to-OpenUSD workflow to speed up robotic system design and validation.

Transforming Manufacturing and Logistics Through Industrial Digital Twins

“Factories themselves are now robotic systems,” Lebaredian said during his special address on digital twins and simulation at GTC. 

All factories are born in simulation. The NVIDIA Mega Omniverse Blueprint provides enterprises with a reference architecture to design, test and optimize robot fleets and AI agents in a physically accurate facility digital twin before a single robot is deployed on the floor. 

KION, working with Accenture and Siemens, is using this blueprint to build large-scale warehouse digital twins that train and test fleets of NVIDIA Jetson-based autonomous forklifts for GXO, the world’s largest pure-play contract logistics provider. 

Physical AI Steps From Simulation to the Real World

NVIDIA is partnering with the global robotics ecosystem — including leading robot brain developers, industrial robot giants and humanoid pioneers — to enhance production-level physical AI. 

ABB Robotics, FANUC, KUKA and Yaskawa, which have a combined global install base of over 2 million robots, are using NVIDIA Omniverse libraries and NVIDIA Isaac simulation frameworks to validate complex robot applications and production lines through physically accurate digital twins. These companies have also integrated NVIDIA Jetson modules into their controllers to enable real-time AI inference. 

Robot development starts with the robot brains, which is why leading developers including FieldAI and Skild AI are building theirs using NVIDIA Cosmos world models for data generation and Isaac simulation frameworks to validate policies in simulation. 

Meanwhile, Generalist AI is using NVIDIA Cosmos to explore generating synthetic data. This combination allows robots to become proficient in any task — from supply chain monitoring to food delivery — at an exceptional pace. 

Read all of NVIDIA’s announcements from GTC on this online press kit and watch the keynote replay. Catch up on all Physical AI Days sessions from GTC and watch the developer livestream replay.