Real2Sim: A Revolutionary Replacement for Sim2Real by NVIDIA and Fei-Fei Li's New System
NVIDIA GEAR and Fei-Fei Li's team have developed SimFoundry, which can automatically generate robot simulation environments from real-world videos, enabling nearly infinite training data through object swapping and task generation.
PUBLISHED: 2026-07-05T06:59:11.000Z
In the field of robotics, Real2Sim (transforming reality into simulations) has emerged as a groundbreaking approach, poised to replace the traditional Sim2Real (transferring simulations to reality).
Recently, NVIDIA GEAR, in collaboration with Professor Fei-Fei Li’s team from Stanford University, Georgia Institute of Technology, UT Austin, and the University of Toronto, announced a new Real2Sim system called “SimFoundry,” as reported by QbitAI.
SimFoundry is a system capable of automatically generating simulation environments for robot interaction, training, and evaluation from a single real-world video. It goes beyond mere reconstruction of 3D scenes by retaining the functionality and affordances of objects, enabling automatic object swapping, scene layout adjustments, and even creating new operational tasks. Its most remarkable feature is the ability to not only generate a single simulation scene from one video but also to automatically expand it into an almost infinite data generation space.
SimFoundry is not just about training robots in simulation; it also accurately predicts the real-world performance of different robot strategies. Furthermore, strategies trained using data generated by SimFoundry have been successfully deployed to real robots in a zero-shot manner, achieving real-world transitions in tasks such as multi-step operations, bimanual coordination, and articulated object manipulation.
Why Real2Sim is Gaining Attention
Traditional Sim2Real approaches involve training robots in simulation environments and then transferring their strategies to the real world. However, building simulation environments has required extensive manual modeling, making it difficult to fully replicate the complex geometry and physical properties of the real world. As a result, strategies that perform well in simulations often fail to meet expectations in real-world applications.
Real2Sim, on the other hand, seeks to resolve this issue from the opposite perspective. By building simulation environments based on real-world data, the realism of simulations significantly improves. According to the QbitAI article, while collecting real-world data can be costly and time-consuming, once the simulation environment has been built, automated data generation techniques can synthesize diverse, high-quality training data at a very low human cost.
Existing Real2Sim methods have faced challenges. While some excel at reconstructing 3D scenes, they fail to generate training data. Others can evaluate strategies but rely heavily on manual configuration. Most approaches only address isolated aspects of the problem. SimFoundry is innovative in that it integrates scene construction, data generation, strategy evaluation, and training into a single, cohesive pipeline.
Three Core Functions
The SimFoundry system performs three main functions:
-
Automatic Reconstruction of Interactive and Simulatable Digital Twins SimFoundry automatically generates virtual environments that accurately replicate real-world scenes.
-
Automatic Expansion of Digital Cousins for Continuous Data Generation By creating “digital cousins”—variations of scenes that maintain the functionality and interaction potential while altering objects, layouts, and tasks—SimFoundry can automatically generate diverse training data from a single scene.
-
Simultaneous Strategy Evaluation and Training Leveraging these simulation environments, SimFoundry conducts strategy evaluation and training simultaneously, forming a complete closed-loop process of transitioning between reality and simulation.
The Three Stages of the Pipeline
The SimFoundry process consists of three stages:
1. Extraction: The system takes a standard RGB video as input and uses depth estimation to reconstruct 3D point clouds. Visual language models (VLMs) and segmentation models like SAM 3 then recognize and segment objects within the scene. As each object is extracted, inpainting is used to remove it from the video, and this process is repeated until the entire scene is analyzed.
2. Generation: For each extracted object, SimFoundry uses 2D-to-3D models to generate 3D meshes, which are then combined with models like FoundationPose to reconstruct their actual positions and orientations. For articulated objects like drawers or doors, SimFoundry automatically derives their joint structures, completes physical properties such as mass and friction, and generates collision models to address penetration issues. Finally, it exports simulation scenes that can be directly executed using physics engines like IsaacLab, completing the construction of a digital twin.
3. Augmentation: The core innovation of SimFoundry lies in this stage. Based on the digital twin, the system automatically generates digital cousins in three dimensions:
- Object Cousins: Altering the appearance or geometry of objects while maintaining functionality.
- Scene Cousins: Adjusting object layouts or adding new objects to create new scenes.
- Task Cousins: Automatically deriving new robot operation tasks based on the objects and their affordances in the scene.
This means that from a single real-world video, not only can a digital twin be reconstructed, but new objects, scenes, and tasks with similar operational semantics can also be generated in large quantities, providing robots with almost infinite training data.
Experimental Results Demonstrating Effectiveness
Researchers conducted experiments on two robotic platforms across seven typical manipulation tasks to validate two core capabilities: Real-to-Sim strategy evaluation and Sim-to-Real strategy training.
The results showed that robot performance in SimFoundry closely matched real-world performance, with an average Pearson correlation coefficient of 0.911 and an average maximum rank violation (MMRV) of just 0.018—significantly outperforming the state-of-the-art evaluation framework, PolaRiS. This demonstrates that strategies trained in SimFoundry can be predicted with relatively high accuracy for real-world performance, reducing the need for costly physical testing.
Notably, the study highlighted the effectiveness of digital cousins. When training was conducted using only digital twins, the average task success rate in the real world increased by 17%, 21%, and 40% with the introduction of Object, Scene, and Task cousins, respectively.
Additionally, strategies trained solely on data generated automatically by SimFoundry achieved near-perfect success rates in multiple manipulation tasks when deployed on real robots in a zero-shot manner. These results strongly support the efficacy of the Real2Sim approach and the practicality of SimFoundry.
Research Team and Future Prospects
The authors of SimFoundry include leading researchers from NVIDIA GEAR, Georgia Tech, Stanford University, UT Austin, and the University of Toronto. The first author, Nadun Ranawaka Arachchige, is a Georgia Tech alumnus currently interning at NVIDIA GEAR under the mentorship of Danfei Xu. NVIDIA GEAR is a world-leading research group in embodied AI.
The Real2Sim approach has the potential to revolutionize data efficiency and training scalability in robotics. By addressing the simulation-to-reality gap through the reverse concept of constructing simulations from real-world data, SimFoundry could drastically shorten the development cycle for robotics strategies.
However, as noted in the QbitAI article, further validation is needed to ensure that the digital cousins generated by SimFoundry consistently align with the physical constraints of the real world. Additionally, the computational resources required for building simulation environments remain a significant factor in determining its practicality.
Editorial Opinion
SimFoundry challenges the core paradigm of Sim2Real in robotics. The Real2Sim approach of automatically generating simulation environments from real-world data could fundamentally alter the cost structure of robot development. The automatic expansion of “digital cousins” through object swapping and task generation effectively addresses the inefficiencies of manually constructed simulation environments.
In the long term, the direction set by SimFoundry for Real2Sim has the potential to impact not only robotics but also the development of AI systems requiring real-world interactions, such as autonomous driving and drone control. The cycle of constructing simulations from real-world data and using those simulations to generate training data could lead to a significant leap in data efficiency.
However, while SimFoundry’s evaluation methods have shown a high correlation with real-world performance, simulations cannot fully replace reality. Handling unforeseen physical interactions and sensor noise, which are difficult to model in simulations, remains a challenge for the future.
References
- QbitAI — Published on 2026-07-05
Frequently Asked Questions
- How does SimFoundry differ from traditional Sim2Real approaches?
- Traditional Sim2Real involves applying strategies trained in simulation environments to the real world. SimFoundry, on the other hand, adopts a Real2Sim approach, automatically constructing simulation environments from real-world videos and training within these environments. This reduces the gap between simulation and reality.
- What are digital cousins?
- Unlike exact digital twins, digital cousins are simulation environments that maintain the functionality and interaction methods of a scene while making plausible changes to objects, layouts, and tasks. This allows for the automatic generation of diverse training data.
- How cost-effective is SimFoundry?
- While exact cost savings are not specified in the research, SimFoundry automates the construction of simulation scenes and generates nearly infinite training data from a single video, making it significantly more cost-effective than traditional manual methods. ## References - [QbitAI "Fei-Fei Li's New Research: From Sim2Real to Real2Sim"](https://www.qbitai.com/2026/07/443066.html) — Published on 2026-07-05
Comments