AI

Zhejiang University and Tencent Unveil Role-Playing Framework with AI as "Director"

Zhejiang University and Tencent YouTu Lab have announced "AdaMARP," an immersive role-playing framework where AI integrates thought, actions, and environment, enabling complex storytelling through a four-channel message format and a scene manager.

4 min read Reviewed & edited by the SINGULISM Editorial Team

Zhejiang University and Tencent Unveil Role-Playing Framework with AI as "Director"
Photo by Fredrick Tendong on Unsplash

From AI That “Speaks” to AI That “Thinks,” “Acts,” and “Perceives the Environment”

The research team from Zhejiang University and Tencent YouTu Lab has unveiled a new framework called “AdaMARP” for immersive and versatile role-playing. This framework fundamentally addresses two key limitations of traditional role-playing systems based on large language models (LLMs): the lack of immersion and static interaction structures. Their findings have been accepted at ACL 2026, one of the top conferences in the field of language processing.

Two Limitations of Traditional AI Role-Playing

Current LLM-based role-playing services primarily rely on conversational exchanges between a user and an AI character. However, the research team has identified two major shortcomings of these systems.

First, environmental information is not adequately modeled, often leaving characters “talking to themselves in an empty room.” In a detective scenario, for instance, environmental cues such as wax marks on a carpet at a crime scene or an unopened letter in a witness’s residence are not mere decorations; they are crucial signals that drive the plot and establish causal links. In traditional systems, these environmental cues are not organically connected to the characters’ thoughts or actions.

Second, the scenes and characters are static, and the story does not progress dynamically. In a back-and-forth Q&A format with a single character, it is challenging to create complex, open-ended narratives where users can collect evidence across multiple locations or dynamically introduce new witnesses.

AdaMARP’s Two Key Innovations: Four-Channel Messaging and a Scene Manager

AdaMARP addresses these challenges with two core innovations: an “immersive messaging format” and an “adaptive framework.”

1. Thought–Action–Environment–Speech: Four-Channel Messaging Format

The core of AdaMARP lies in its ability to generate information for each interaction round by intertwining four channels: Thought, Action, Environment, and Speech. For example, in a scene where Sherlock Holmes interrogates a witness, the following causal chain might unfold:

<The gaslight flickers, and the witness glances unconsciously at the clock on the fireplace> [He is avoiding specifics; he wasn’t at the scene during that time] (tapping the table lightly with his pipe) “Where exactly were you between 8 and 9 on the night of the incident?”

Environmental cues prompt internal reasoning, which leads to pressure-inducing actions and culminates in interrogative speech. This layering adds depth and realism to the characters’ responses.

2. Scene Manager with Three Agents and Five Types of Actions

The second key component is the “Scene Manager,” which serves as the “director” of the overall narrative. It orchestrates the flow of the story through five discrete actions:

  • init_scene: Initializes a scene (e.g., 221B Baker Street).
  • pick_speaker: Selects the next speaker and provides context for their dialogue.
  • switch_scene: Changes the location (e.g., from the crime scene to the witness’s apartment).
  • add_role: Introduces new characters midway through the story.
  • end: Concludes the interaction.

The Scene Manager bridges the Actor model (AI playing the character) and the User model (the user), managing the overarching story progression by determining when to switch scenes or introduce new witnesses, among other decisions.

Practical Example: Sherlock Holmes Detective Drama Demonstration

The research team showcased the effectiveness of AdaMARP through a demonstration based on a Sherlock Holmes detective story. Beginning at a midnight crime scene, Holmes deduces leads from environmental cues like wax marks. Following instructions from the Scene Manager, Watson is dispatched to investigate, and the setting shifts to a landlord’s residence. A new witness (the landlord) is introduced, and Holmes begins interrogation—this sequence exemplifies how the Scene Manager ensures a consistent narrative flow by issuing “reasoned” actions at each step.

This research suggests that interactions with AI can evolve from mere text-based chats to shared experiences where users and AI navigate through environments and time together. Potential applications span education, entertainment, therapy, and more, opening doors to immersive storytelling and interactive scenarios.

Frequently Asked Questions

What kinds of services could AdaMARP be applied to in the future?
While currently a research framework, AdaMARP could be applied to create highly immersive interactive story experiences in games, educational tools for interacting with historical figures, or training simulations. It has the potential to serve as the foundation for new forms of entertainment and educational software, where users actively drive the narrative.
How does AdaMARP differ from existing AI chatbots?
The biggest difference is that AdaMARP allows AI to "perceive" its environment and actively manage the progression of a story. Traditional chatbots only generate responses based on user input, while AdaMARP enables AI to interpret environmental cues, switch scenes, and introduce new characters for dynamic and unpredictable storytelling. Rather than simply being a conversational partner, the AI takes on the role of a "co-creator" or "director" of the narrative.
Source: 量子位

Comments

← Back to Home