Qwen-AgentWorld: Language World Models for General Agents

Qwen-AgentWorld: Language World Models for General Agents

Qwen-AgentWorld is a framework for language world models designed to predict environment dynamics based on current observations and actions. By simulating agentic environments through long chain-of-thought reasoning, Qwen-AgentWorld provides a cognitive mechanism for AI agents to improve their reasoning and planning capabilities without relying solely on real-world interactions.

Foundation Models for Environment Simulation

Qwen-AgentWorld introduces two primary models, Qwen-AgentWorld-35B-A3B and Qwen-AgentWorld-397B-A17B, which are the first language world models capable of simulating agentic environments across seven distinct domains. These models are trained to predict the next state of an environment given a current observation and a specific action.

Three-Stage Training Pipeline

The models were developed using a training pipeline based on more than 10 million environment interaction trajectories:

  1. Continual Pre-training (CPT): This stage injects general-purpose world modeling capabilities by utilizing state transition dynamics and augmented professional corpora.
  2. Supervised Fine-Tuning (SFT): This stage activates next-state-prediction reasoning, enabling the model to logically derive the resulting state of an action.
  3. Reinforcement Learning (RL): This stage improves simulation fidelity using a tailored framework that employs hybrid rubric-and-rule rewards to ensure the simulated environment behaves accurately.

AgentWorldBench Evaluation

To measure the performance of language world models, the researchers introduced AgentWorldBench. This benchmark is constructed from real-world interactions of five frontier models across nine established benchmarks. Empirical results indicate that Qwen-AgentWorld significantly outperforms existing frontier models in its ability to simulate environment dynamics.

Enhancing General Agents via World Modeling

Qwen-AgentWorld enhances general agents through two distinct paradigms:

Decoupled Environment Simulation

Qwen-AgentWorld can act as a standalone environment simulator. This allows for the scalable and controllable simulation of thousands of real-world environments, which can then be used for agentic Reinforcement Learning (RL). Training agents within these simulations yields performance gains that exceed those achieved by training in real environments alone.

Unified Agent Foundation Model

Training a model as a world model serves as an effective "warm-up" for general agent tasks. When a model is first trained to understand and simulate environment dynamics, its downstream performance improves across seven different agentic benchmarks, suggesting that world-modeling capabilities are foundational to general agent proficiency.

Sources