Job title: Head of Research - RL & Post-training
Job type: Permanent
Emp type: Full-time
Industry: Artificial Intelligence & Machine Learning
Skills: Reinforcement Learning Post-training Research Research leadership
Salary type: Annual
Salary: negotiable
Location: San Francisco or NYC
Job published: 02/10/2025
Job ID: 33880

Job Description

Head of Research – Post-Training & Reinforcement Learning

Ready to shape how the next generation of AI is trained, aligned, and supervised?

This role is about leading one of the most critical research agendas in AI today: advancing post-training and reinforcement learning methods that ensure increasingly capable models remain aligned, reliable, and safe. You’ll define the environments and frameworks where frontier models learn and set the direction for how society supervises AI as it surpasses human performance.

As Head of Research, you’ll guide a team of applied ML and research experts from FAIR, Meta Reality Labs, Airbnb, Amazon and beyond. You’ll stay hands-on with the research, designing experiments in RLHF, DPO, GRPO; developing reward models that move beyond exact-match signals; and building complex RL environments that stress-test reasoning, planning, and long-horizon behaviour. At the same time, you’ll shape the technical vision, ensuring the team’s work translates into production systems already used by leading AI labs.

You’ll also play a visible role in the broader ecosystem: publishing at top venues (NeurIPS, ICLR, ACL, EMNLP), releasing benchmarks and open-source tools, and influencing both technical standards and broader policies for AI alignment and evaluation.

You should bring:
  • Deep research experience in post-training or RL methods (RLHF, DPO, GRPO, reward modelling).
  • Strong background in training and evaluating large language models.
  • Proven publication record at top-tier venues (NeurIPS, ICLR, ICML, ACL, EMNLP).
  • Experience leading research teams and scoping high-impact projects.
  • Curiosity, creativity, and the ability to thrive in a fast-moving startup environment.

Package: $300k–$400k base + significant equity. Full benefits including health, dental, vision, 401k, unlimited PTO, and global offsites. Onsite in San Francisco preferred (relocation support available), with flexibility for exceptional candidates.

If you want to define how reinforcement learning environments and post-training frameworks shape the future of AGI, this is the role for you. 

 All applicants will receive a response.