Jiuwen Symbiosis Giving Bodies to AI Agents Released as Open Source
The openJiuwen community has released the Physical AI architecture "Jiuwen Symbiosis" as open source. This article explains the design philosophy that brings real-world perception and action to AI agents.
On June 13, the openJiuwen community released the Physical AI architecture “Jiuwen Symbiosis” as open source. Its design, which provides AI agents with a foundation for perception and action in the real world, is drawing attention. The code is available on Gitcode.
The AI industry currently faces a fundamental challenge. While large language models (LLMs) can handle code generation, mathematical reasoning, and advanced dialogue, they cannot perform physical actions like pouring a glass of water by themselves. This contradiction symbolizes the limitation that “AI has no body.”
What Moravec’s Paradox Reveals
Moravec’s paradox, proposed by roboticist Hans Moravec in 1988, sharply illustrates this situation. For computers, solving chess or higher mathematics is easy, but actions that even a human infant can perform—such as “walking,” “grasping,” “obstacle avoidance,” and “maintaining balance”—are surprisingly difficult.
The reason is that these abilities are not the product of logical reasoning but are rooted in bodily intelligence shaped by millions of years of evolution. Current large language models are often likened to “brains in a vat.” They have high IQs but no physical form, and they fundamentally lack an understanding of real-world friction, gravity, and spatial geometry.
The Evolutionary History of Physical AI
The openJiuwen team organizes the evolution of intelligence operating in the physical world into three stages.
Stage 1.0 is manual tasks. It relies on human understanding and performs extremely atomic control operations. Examples include individual movements of a robot arm.
Stage 2.0 is virtual environment training (Sim2Real). Agents learn in simulation environments like Habitat and AI2-THOR and begin to acquire spatial concepts. This evolved into a method that uses multiple models in parallel, where the brain analyzes instructions and executes tasks.
However, this stage exposes several issues.
Five Challenges Facing Current VLA Models
First, a lack of cross-embodiment generalization capability. Once a model is trained, its skill set becomes fixed. Teaching a robot a new task like “open a drawer and grasp an object inside” requires recollecting data and retraining the model. Vision-Language-Action (VLA) models lack combinatorial generalization; they cannot zero-shot combine “opening a drawer” and “grasping.”
Second, insufficient ability for long-horizon composite tasks. Short-range atomic operations (e.g., “grasp the red block”) are easy, but for composite tasks like “retrieve a tray from the material shelf, bypass the equipment, load it into the machine, press the confirm button, and return to the original position,” a single VLA model struggles with task decomposition, subtask organization, and anomaly recovery.
Third, difficulty in fault identification. Current models compress vision, language understanding, physical reasoning, and action generation into a single Transformer. When execution failures occur (e.g., grasping misalignment, collisions), it is impossible to pinpoint whether the cause is perceptual misrecognition, language ambiguity, physical reasoning errors, or control trajectory divergence.
Fourth, low success rates and poor stability. End-to-end foundation models are typically black-box structures that directly output low-level motion commands such as joint positions and postures. Since a large model handles both cognitive judgment and motor control, implementation difficulty is high, and model stability is compromised.
As Stage 3.0, the openJiuwen team positions the “symbiosis era” that Jiuwen Symbiosis aims for. It seeks to blur the boundary between the virtual and real worlds, enabling the agent to truly understand physical laws and output action sequences that directly control the low-level topology of hardware.
The Arrival of the Agent Era and the Need for
Physical AI
Since 2023, agents have become one of the most notable directions in the AI field. With the emergence of Tool Calling, Function Calling, MCP, Browser Agents, and Computer Use Agents, agents are beginning to gain the ability to manipulate the world. However, the objects these agents operate on are still limited to the digital world.
The openJiuwen team believes it is time for agents to expand into the real world. The mode of Physical AI agents is fundamentally different from traditional agents. Traditional agents are based on text input and output, while Physical AI agents require interaction and feedback with the real environment.
The team further points out that the process of humans performing tasks is a continuous real-time system of observation and feedback. In theory, a pipeline of “sensor → VLM, LLM, Planner, ROS” looks elegant, but in practice, it often becomes a complex JSON-based stack. The more complex the system, the more opaque the agent’s thinking process becomes.
The Design Philosophy of Jiuwen Symbiosis:
Transparent Situation Awareness
The design philosophy of Jiuwen Symbiosis comes down to this: “The agent’s thinking process should be observable, debuggable, and collaborative.” The team adopted an approach that explicitly exposes the agent’s internal state rather than hiding it in a black box.
The cognitive layer and execution layer collaborate through a shared Workspace to solve complex task execution. This ensures cognitive accuracy and rapid response while greatly simplifying cross-embodiment adaptation. The core skeleton is called the “Situation Awareness Loop.”
Based on this loop, the team has added key technical modules: safe planning, state awareness, observation feedback, and spatial memory.
Details of the Five Functional Modules
Multimodal Perception enables the Physical AI agent to actively perceive the world. It separates understanding from judgment, thoroughly understanding the scene before executing actions, and generates structured world states (detected objects, object poses, confidence levels, etc.).
Safe Planning performs task planning based on prompt-based task instructions and structured world states. It dynamically assigns values to relevant skill parameters, verifies physical feasibility, safety, and constraints, and rejects infeasible plans.
Physical Action follows the skill proposals and invokes the atomic capabilities of related Action Tools. Ultimately, it executes continuous, controllable physical motions such as moving, grasping, setting, and interacting.
Observation collects the real-world state after physical action execution and extracts it in a structured manner. It acquires execution results through sensors such as vision, recognizing key information like object poses, environmental changes, and interaction effects. The output structured world observation state provides objective grounds for subsequent feedback deviation calculations.
Feedback builds a closed-loop correction mechanism based on observation results. It feeds back execution deviations, abnormal states, and success/failure judgments to the reasoning and planning modules. This enables real-time adjustment of motion parameters, dynamic optimization of planning sequences, and autonomous recovery from abnormal scenes, while also accumulating interaction data.
Editorial Opinion
Short-term Impact The open-source release of Jiuwen Symbiosis can be evaluated as providing a design guideline for the field of Physical AI. In particular, the design that separates the cognitive and execution layers and makes the agent’s internal state transparent has the potential to improve debuggability and reliability in robotics. Over a span of three to six months, it is expected that research institutions and startups facing similar challenges will increasingly reference or adopt the Jiuwen Symbiosis architecture.
Long-term Perspective Looking at a one- to three-year span, the design philosophy of separating cognition and execution, along with a clear situation awareness loop, could become established as a standard architecture for Physical AI. Especially as the limitations of black-box VLA models become recognized, a shift toward approaches that emphasize modularization and transparency will likely accelerate. Applications in manufacturing and logistics are expected to lead the way, later spreading to service robotics.
Questions from the Editorial Board How much of a performance difference will the transparent design advocated by Jiuwen Symbiosis make in actual industrial robot control? Data on the trade-offs compared to black-box end-to-end models, particularly in terms of latency and task success rates, are awaited. Furthermore, whether the cognitive-execution collaboration design via a shared Workspace can be applied to multi-robot coordination remains an issue for future verification.
References
- Quantum Bit: “Agents Finally Grow Bodies: The Thinking and Practice Behind Jiuwen Symbiosis” — Published June 13, 2026
Frequently Asked Questions
- What is Jiuwen Symbiosis?
- It is a Physical AI architecture developed by the openJiuwen community. It is designed to provide AI agents with perception and action in the real world, centered on a "Situation Awareness Loop" where the cognitive and execution layers collaborate through a shared Workspace. It is released as open source.
- What is the concept of Physical AI?
- It is a general term for artificial intelligence that operates not only in the digital world but also in the real physical world. While conventional AI specializes in processing text and images, Physical AI performs physical actions, perception, and interaction through robots and other means. Jiuwen Symbiosis provides the foundational architecture for this.
- What is the difference between VLA models and Jiuwen Symbiosis?
- VLA (Vision-Language-Action) models integrate vision, language, and action generation into a single black box, whereas Jiuwen Symbiosis separates the cognitive and execution layers and explicitly exposes the agent's internal state. This makes it easier to identify failure causes and debug, and simplifies cross-embodiment adaptation.
Comments