DreamX-World

DreamX Team

DreamX-World is a general-purpose world model that creates diverse, high-fidelity worlds that can be explored, controlled, and transformed through actions and event prompts. Going beyond passive video generation, DreamX-World enables interactive world simulation with strong controllability, high visual fidelity, and flexible prompt-driven events. Built on top of a scalable data engine, DreamX-World is trained on a diverse mixture of data sources, including Unreal Engine data, gameplay footage, and real-world videos. Through accurate camera estimation, rigorous data filtering, and carefully curated data distributions, the model learns realistic world dynamics and rich interactive behaviors across a wide range of environments. Our training pipeline is designed progressively. The model first learns world dynamics and fine-grained action control, then acquires the ability to respond to open-ended events, and is further improved with Reinforcement Learning to enhance action following, interaction consistency, and visual fidelity. Finally, through forcing and distillation, DreamX-World achieves efficient inference, making interactive generation practical at scale.

Navigate and Explore Realistic Worlds

DreamX-World supports interactive exploration with high fidelity, precise control, and rich dynamic scene generation. It generalizes across a broad range of realistic environments, including indoor scenes, urban streets, natural landscapes, and architectural spaces. The model responds accurately to fine-grained action inputs, allowing users or agents to move through generated environments in a controllable and physically plausible way.

Dive into Dream Worlds

In addition to realistic environments, DreamX-World unlocks a broad space of imaginative world generation. It supports fantastical worlds, game-like environments, science-fiction settings, and highly stylized visual domains, extending world generation from simulation to creation.

Generate in Third-Person View

DreamX-World supports both immersive first-person interaction and coherent third-person world generation. Beyond first-person exploration, the model can simulate third-person experiences in which an agent moves through the world while the camera follows consistently across space and time. This capability enables a wide range of scenarios, from dynamic outdoor traversal to game-like character control, while maintaining stable camera-follow behavior, controllable agent motion, and coherent scene evolution. It is especially important for embodied agents, interactive simulations, and game-inspired world experiences.

Promptable World Events

Beyond navigational control, DreamX-World supports prompt-driven world events that dynamically alter the generated environment. Compared with prior approaches, DreamX-World supports more flexible and compositional event generation while maintaining consistent, interactive, and temporally coherent world evolution. This capability also provides a strong foundation for agents learning from experience, enabling them to encounter and adapt to unexpected situations in dynamic environments.

Next Steps

While DreamX-World already supports high-fidelity generation, controllable interaction, and promptable events, the next frontier is real-time interactive generation of long-time worlds, where users or agents can act, respond, and explore continuously without compromising temporal coherence or world consistency. Reaching this goal will require further advances in efficiency, long-horizon stability, memory, and interaction modeling, so that generated worlds remain responsive, coherent, and persistent over extended durations.

We also believe that progress in world models should benefit the broader research community. To help accelerate research in this area, we are committed to open-sourcing DreamX-World as soon as possible, including the model and code. We hope this will enable further advances in interactive generation, embodied intelligence, agent training, and creative world building.