Job Description
Interested in building distributed training infrastructure that powers a frontier-scale superintelligence platform?
This role is about creating the systems that make breakthrough AI research possible. You’ll be working on large-scale training infrastructure for LLMs and multimodal models, building the backbone for models that generate new knowledge across multiple domains.
Your work will cover distributed training systems, performance optimisation, and scalable pipelines that enable complex experiments to run across thousands of GPUs. Instead of maintaining legacy stacks, you’ll be designing the infrastructure that pushes models further — and accelerates real-world progress.
You’ll work closely with researchers tackling problems at the cutting edge, ensuring the systems you build directly support new discoveries. This isn’t just another ML engineering role: it’s about creating the foundation for next-generation AI.
You should have:
-
Proven experience with distributed ML training frameworks.
-
Strong engineering background in Python and C++.
-
Understanding of large-scale model training techniques.
-
Experience in cloud or HPC environments.
Package: $250k–$350k base + equity, full benefits. Onsite in San Francisco, CA or Boston, MA.
If you want your engineering to enable genuine breakthroughs — not just optimise another product pipeline — this role is for you.
All applicants will receive a response.