New Job Opening: Staff ML Infrastructure Engineer (GPU & Distributed Systems) in

Job title:	Staff ML Infrastructure Engineer (GPU & Distributed Systems)
Job type:	Permanent
Emp type:	Full-time
Industry:	Research
Salary type:	Annual
Salary:	negotiable
Job published:	08/04/2026
Job ID:	35635

Job Description

Are you looking to scale GPU infrastructure up to and beyond 10,000 GPUs?

You'll help push an already high-performing team past their current operating level, using your skills and experience to scale training workloads, improve cluster reliability/usage and build systems that hold up under real pressure.

Your focus will be on distributed training and GPU infrastructure, making large-scale training actually usable for researchers—not just possible.

You'll be working across frontier model training, scientific workloads and robotics environments. So you're dealing with high-throughput systems and real-world constraints, not just controlled experiments.

You'll join a team that owns compute end-to-end—infra, systems, and operations—working closely with researchers to make training at this scale reliable.

They've raised over $500M, have real customers, and are now integrating models directly into robotics environments and beyond.

Key experience

Experience scaling GPU infrastructure from 2,000 to 10,000+ GPUs
Experience with Ray, Slurm or similar
Experience supporting core model training

The culture is collaborative and hands-on:

Strong focus on knowledge sharing and upskilling
Cross-team collaboration with researchers
6-week cycles to allow deep focus and meaningful impact
A team that works hard but also likes to keep it fun

Up to $350k base + bonus + equity DOE.

Remote across the US or hybrid options available in SF.

All applicants will receive a response.

Apply with indeed

Upload Resume | Portfolio

File types (doc, docx, pdf, rtf, png, jpeg, jpg, bmp, jng, ppt, pptx, csv, gif) size up to 5MB

First name

Last name

Phone number

Location

By checking this box, you agree to our Terms of Service

Job Description

Our use of cookies