Your search has found 2 jobs

Most AI roles build on top of models.

This one builds what makes them actually work.

 

We’re hiring ML Infrastructure Engineers to tackle a hard, real-world problem, understanding what’s happening on live job sites using wearable devices, large-scale video, and AI.

 

This isn’t clean benchmark data.

 

It’s messy, continuous, real-world input flowing from device → edge → cloud, at scale.

 

You’ll be working across:

  • High-throughput video pipelines handling millions of hours of data
  • Training and inference systems for multimodal / LLM-based models
  • GPU infrastructure and performance optimisation
  • Hybrid environments spanning edge, on-prem, and cloud

 

The role is end-to-end. Ingestion through to deployment.

 

You’ll be building the systems that make applied AI viable outside the lab.

 

The team comes from top AI and infrastructure companies, with strong funding and a clear technical roadmap. This is a systems challenge as much as an ML one.

San Francisco (on-site)

$250k–$350k base + strong equity

 

If you’ve built ML or data infrastructure at scale and care about real-world constraints, this is worth a conversation.

 

All applicants will receive a response.

Location: San Francisco, CA
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 30/04/2026
Job ID: 35701

Senior Applied Researcher

Want to build vision-language models that understand complex, real-world environments?

You’ll join a small, highly technical team working on foundational problems in multimodal AI, focused on training models that can interpret, reason, and act on large-scale first-person video data.

You’ll work directly with the Chief Science Officer, shaping how models are designed, trained, and evaluated. The work sits at the intersection of VLMs, long-context reasoning, and real-world deployment.

The focus is on building systems that move beyond static perception, towards temporal understanding, activity recognition, and higher-level reasoning across dynamic environments.

Your work will centre on:

  • Designing and training VLMs on large-scale video datasets
  • Developing post-training approaches including SFT, RLHF, and parameter-efficient tuning
  • Building scalable training and evaluation pipelines
  • Exploring long-context and temporal modelling
  • Designing efficient systems across edge and server-side inference
  • Defining benchmarks for spatial and behavioural understanding

You’ll bring strong experience training deep learning models, ideally transformer-based, along with hands-on work in vision, language, or multimodal systems.

Experience with large datasets, model optimisation, or deploying models into production environments will be valuable. Exposure to video data or long-context modelling is particularly relevant.

This is a team that values speed, ownership, and first-principles thinking. You’ll be working on open-ended problems with real-world impact, with the freedom to explore and define approaches.

Compensation: Highly competitive salary + equity
Location: San Francisco, onsite

If you’re interested in building multimodal systems that operate in real-world settings, and want to join a well-funded, highly skilled research team, please apply now!

All applicants will receive a response.

Location: San Francisco, CA
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 30/03/2026
Job ID: 35437