Job title: Machine Learning Researcher - World Models
Job type: Permanent
Emp type: Full-time
Industry: Robotics
Salary type: Annual
Salary: negotiable
Job published: 09/03/2026
Job ID: 35305

Job Description

Machine Learning Researcher – World Models (Generative Video & Simulation)

Ready to build models that learn the structure and dynamics of the physical world?

This role focuses on developing world models, large-scale generative systems capable of simulating environments, actions, and interactions over extended time horizons. The goal is to move beyond short video generation toward models that can represent persistent environments and evolving dynamics.

As a Machine Learning Researcher, you’ll work on spatiotemporal generative models that learn how the world changes over time. These models aim to capture physical interactions, causality, and long-term dynamics, forming the foundation for intelligent systems that can reason about future outcomes and learn through simulation.

Your work will explore how generative models transition from producing short visual sequences to maintaining coherent simulations of environments where objects move, interact, and evolve consistently over time.

You’ll contribute across the full modelling lifecycle, from architecture design and training infrastructure through to evaluation and iteration. The role blends deep research with practical implementation, where experimental ideas are tested at scale and integrated into real systems.

This is a research-driven environment where engineers have significant ownership over model design, training strategies, and evaluation frameworks.

Your focus will include:

  • Designing and training spatiotemporal world models capable of learning long-horizon dynamics

  • Advancing video generation systems into persistent simulations that maintain coherence across time

  • Running large-scale training experiments on multi-billion parameter generative models

  • Improving temporal consistency, memory, and controllability in generative architectures

  • Developing evaluation methods for physical plausibility, causal consistency, and simulation stability

  • Working with large video datasets, including synthetic environments and real-world recordings

Hands-on experience with video generation, spatiotemporal modelling, or multimodal generative models is essential. This could include work with diffusion models, autoregressive approaches, transformers, or related architectures.

You should be comfortable implementing recent research, designing experiments, and iterating quickly on large training runs. Experience managing experiments on large GPU clusters and training large models at scale is highly valuable.

Strong coding ability in Python is required, with C++ or Rust considered beneficial.

You’ll have significant ownership over modelling decisions and the opportunity to shape how world models evolve within a small, technically ambitious AI research team.

Compensation: $200,000 – $350,000 base (negotiable depending on level) + equity + benefits

Location: San Francisco (On-site)

If you’re motivated by pushing generative models beyond video into world simulation and long-horizon reasoning, we’d like to speak with you.

All applicants will receive a response.