Your search has found 24 jobs

Advance how agents and LLMs learn from feedback in realistic environments.

If you've been working at the intersection of reinforcement learning and large language models, this is an opportunity to work on the foundations of how AI systems are trained, evaluated, and supervised, with your research shipping into production.

You will work hands-on on fundamental problems spanning LLM post-training, RL simulation environments, and agentic evaluation, shaping core methods and benchmarks used by leading AI labs and enterprises around the world.

The team actively publishes and collaborates with external research labs, with recent work appearing at ACL and NeurIPS. You'll see your ideas move from concept to deployed systems, working alongside engineers who build fast and take research seriously.

This is a research-driven company growing quickly due to real demand for what they're building. If you want your work to matter, both in the literature and in production, this is where to do it.

You'll bring hands-on experience in applied research across RL, LLM post-training, or agent-based systems, with a strong understanding of transformer architectures and fine-tuning. As important as the theory is the ability to ship — you can translate research ideas into production-ready systems that actually work. A track record of publishing at top-tier venues such as NeurIPS, ICML, ACL, or EMNLP is a plus, but what matters most is the quality of your thinking and your ability to execute.

What you'll do

  • Conduct research on LLM post-training methods (RLHF, RLAIF, RLVR)
  • Design and build realistic RL simulation environments for agents
  • Develop agentic evaluation and supervision frameworks
  • Create and maintain benchmarks for emerging AI capabilities
  • Collaborate with engineers to take research from idea to deployed systems

Location — San Francisco · Salary — Up to $300k base + equity, flexible and negotiable DOE

All applicants will receive a response.

 

Location: San Francisco
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 12/03/2026
Job ID: 33119
Rip up the playbook and step into uncharted territory.

If you've been building long-horizon multi-agent systems and pushing the boundaries of AI research, this is the kind of role where curiosity and ambition meet real execution, exploring truly novel problems at the frontier of what's currently possible.

You will work on systems designed to outperform the current state of the art, tackling problems that don't yet have standardised solutions across RL, long-horizon reasoning, LLM post-training for non-myopic objectives, environment and feedback design.

Whether you're early-career PhD or highly experienced, what matters most is your ability to push novel ideas into working systems, execute your knowledge across reasoning, RL and memory to make real-world impact.

This is a small, ambitious team operating where few others are, building and executing quickly in areas such as computational R&D science. This is your opportunity to shape the systems that generate and validate new discovery in environment primed for success. 

Skills & experience
  • PhD and/or publications at top conferences across long-horizon reasoning, RL, or similar
  • Post-training experience (RLHF, DPO, reward modelling)
  • Experience working on open-ended research 
Location- San Francisco
Salary- $400k base 0.5–1%+ equity Negotiable DOE

All applicants will receive a response. 
Location: San Francisco, CA
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 12/03/2026
Job ID: 35041

 

The Bot Company

We're building a helpful robot for every home.

We're a small team of engineers, designers, and operators based in San Francisco. Our team comes from Tesla, Cruise, OpenAI, Google, Pixar, and many other great companies. In the past we've shipped to hundreds of millions of users and know what it takes to build amazing products and experiences.

Our team is deliberately lean to promote rapid decision making and do away with bureaucracy and hierarchy. Everyone is an IC and is empowered with massive scope, radical ownership, and direct responsibility. We work across the stack with a culture built for rapid iteration and fast execution.

 

What we look for in all candidates

All roles at The Bot Company demand extreme sharpness and the ability to move fast in high-intensity environments. Throughout the process, we expect candidates to demonstrate:

• Exceptional mental acuity: you think quickly, learn instantly, and reason across unfamiliar domains.

• Engineering curiosity: you naturally dig into how systems work, even outside your specialty.

• High performance mindset: you move fast, handle ambiguity, and excel when the environment is demanding.

 

Machine Learning: World Models

We are building neural simulators that understand the grammar of the physical world—including physics, causality, and long-term dynamics.

This role focuses on pushing video generation systems beyond short clips into controllable, large-scale world models that can simulate environments, actions, and interactions over long time horizons. These models will serve as the foundation for robotic intelligence, enabling robots to reason about the future, anticipate consequences, and learn from simulated experience.

You will work on the frontier of generative modeling and large-scale training to build spatiotemporal models capable of learning coherent world dynamics.

 

 

 

What You’ll Do

• Architect Neural Simulators: Design and train large-scale spatiotemporal models capable of learning long-horizon dynamics and physical interactions.

• Build World Models: Develop controllable video generation systems that evolve from short clips into coherent, persistent simulations of the real world.

• Scale Training: Train and optimize multi-billion parameter models across massive GPU clusters.

• Own the Training Loop End-to-End: Design, run, debug, and iterate on large- scale training experiments—diagnosing failure modes, improving data mixtures, and refining evaluation.

• Push Model Architecture Forward: Develop novel approaches for scaling temporal coherence, memory, and controllability in generative models.

• Work Across the Stack: Collaborate with infrastructure, robotics, and autonomy teams to integrate world models into broader robotic intelligence systems.

 

Requirements

• Very strong coding skills in Python, C++, or Rust.

• Video Generation Expertise: deep experience building or researching high- fidelity video generation systems.

• Architectural Intuition: ability to design novel model architectures and reason about scaling laws, emergent behavior, and failure modes.

• Infrastructure Fluency: comfortable managing large-scale experiments across massive GPU clusters.

• Strong understanding of modern generative modeling techniques (diffusion, transformers, autoregressive models, or related approaches).

 

Why Join

You’ll work with a small, elite team on challenges that require speed, intelligence, and deep engineering instinct. If you enjoy understanding systems at all levels, move fast, and think even faster, you’ll thrive here.

 

 

 

Location: San Francisco Employment Type: Full time Location Type: On-site Department: Engineering / Software

Compensation: Base $200K – $350K Actual compensation will depend on skills, experience, and qualifications.

Base salary is one part of the total compensation package. The role is also eligible for equity through the company’s discretionary equity program, along with a comprehensive benefits package that includes medical, dental, and vision coverage, and access to a 401(k) plan.

 

 

Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 09/03/2026
Job ID: 35305

Want to own the inference layer behind millions of real-world voice AI interactions every day?

You’ll join a profitable, founder-led enterprise conversational AI company powering billions of interactions annually across 30+ languages. Their systems sit behind major global brands and handle millions of customer conversations daily.

They’re now moving toward end-to-end multimodal and speech-to-speech architectures. You’ll own the inference stack powering both their multimodal speech-text LLM and their text reasoning LLM.

This goes well beyond tuning configs.

You will:

• Optimise production inference across A10, A100 and H100 GPUs
• Own scheduler design, KV cache allocation and batching logic
• Build serving systems tailored to multimodal audio-text workloads
• Support agentic, multi-step reasoning under real latency constraints
• Profile kernel-level bottlenecks and fix them properly

You’ve modified inference framework internals before, not just used them. You’re comfortable in Python and C++, and you’re happy diving into CUDA graphs, memory bandwidth limits or custom kernels when required.

This platform processes over 2 million interactions per day. Latency, throughput and cost are production realities, not lab metrics.

Package: €150,000 base + bonus + stock options
Location: Remote within Europe

If you want full ownership of inference performance at real enterprise scale, let’s talk.

All applicants will receive a response.

Location: Remote
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 03/03/2026
Job ID: 35239

New York City
$200,000–$250,000 + equity

Define how AI agents actually learn in production.

This team is building the foundational learning framework behind enterprise AI systems.

Not prompt wrappers.
Not repeated fine-tuning.

A system that formalises how work gets done and allows agents to improve continuously in real environments.

You’ll design architectures that turn operational behaviour into structured, executable intelligence — making knowledge compound over time through reasoning loops, persistent memory, and human-in-the-loop feedback, without degrading performance.

You’ll work directly with experienced founders and live enterprise customers on problems where reasoning, context, and workflow execution intersect.


What you’ll work on

  • Expanding the core learning framework that governs how agents improve

  • Designing structured context and memory layers

  • Building reasoning loops and feedback systems

  • Creating continuous learning pipelines from live operational data

  • Shipping production-grade Python systems into real deployments


What you’ll bring

  • Experience building non-trivial LLM systems in production

  • Designed agentic workflows involving reasoning, memory, and tool use

  • Strong Python engineering and systems thinking

  • Clear ownership of end-to-end AI systems


The company

  • Series A backed by Sequoia ($28M)

  • Platform approaching one trillion tokens processed

  • Major enterprise customers live

  • Small, engineering-led team building the learning layer enterprise AI will depend on


Everyone will receive a response.

Location: NYC
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 01/03/2026
Job ID: 35206

Training builds capability. Post-training decides what it becomes.

This team are rethinking how large multimodal models learn after pre-training — developing post-training and reinforcement learning methods that help models reason, plan, and interact in real time.

Founded by the researchers behind several of the most influential modern AI architectures, this lab are pushing alignment and learning efficiency beyond standard RLHF. They’re scaling preference-based training (RLHF, DPO, hybrid feedback loops) to new model types and creating systems that learn from interaction rather than static data.

You’ll work at the intersection of post-training, RL, and model architecture — designing reward models, scalable evaluation frameworks, and training strategies that make large-scale learning measurable and reliable. It’s applied research with direct impact, supported by serious compute and a tight researcher-to-GPU ratio.

You’ll bring experience in large-scale post-training or reinforcement learning (RLHF, DPO, or SFT pipelines), a solid grasp of LLM or multimodal training systems, and the curiosity to explore new optimisation and alignment methods. A publication record at top venues (NeurIPS, ICLR, ICML, CVPR, ACL) is a plus, but impact matters more than titles.

The team are based in San Francisco, working mostly in person. $1 million+ total compensation. Base salary circa $300K – $600K (negotiable) plus stock and bonus — exact package depends on experience.

If you want to work where post-training meets architecture — shaping how foundation models learn, reason, and adapt — this is that opportunity.

All applicants will receive a response.

Location: San Francisco
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 11/02/2026
Job ID: 34012

GPU Optimisation Engineer — Real-Time Inference

Want to push GPU performance to its limits — not in theory, but in production systems handling real-time speech and multimodal workloads?

This team is building low-latency AI systems where milliseconds actually matter. The target isn’t “faster than baseline.” It’s sub-50ms time-to-first-token at 100+ concurrent requests on a single H100 — while maintaining model quality.

They’re hiring a GPU Optimisation Engineer who understands GPUs at an architectural level. Someone who knows where performance is really lost: memory hierarchy, kernel launch overhead, occupancy limits, scheduling inefficiencies, KV cache behaviour, attention paths. The work sits close to the metal, inside inference execution — not general infra, not model research.

You’ll operate across the kernel and runtime layers, profiling large-scale speech and multimodal models end-to-end and removing bottlenecks wherever they appear.

What you’ll work on

  • Profiling GPU bottlenecks across memory bandwidth, kernel fusion, quantisation, and scheduling

  • Writing and tuning custom CUDA / Triton kernels for performance-critical paths

  • Improving attention, decoding, and KV cache efficiency in inference runtimes

  • Modifying and extending vLLM-style systems to better suit real-time workloads

  • Optimising models to fit GPU memory constraints without degrading output quality

  • Benchmarking across NVIDIA GPUs (with exposure to AMD and other accelerators over time)

  • Partnering directly with research to turn new model ideas into fast, production-ready inference

This is hands-on optimisation work across the stack. No layers of bureaucracy. No “platform ownership” theatre. Just deep performance engineering applied to models that are actively evolving.

What tends to work well

  • Strong experience with CUDA and/or Triton

  • Deep understanding of GPU execution (memory hierarchy, scheduling, occupancy, concurrency)

  • Experience optimising inference latency and throughput for large generative models

  • Familiarity with attention kernels, decoding paths, or LLM-style runtimes

  • Comfort profiling with low-level GPU tooling

The company is revenue-generating, its models are used by global enterprises, and the SF R&D team is expanding following a recent raise. This is growth hiring, not backfill.

Package & location

  • Base salary: up to ~$300,000 (negotiable based on depth)

  • Equity: Meaningful stock

  • Location: San Francisco preferred (relocation and visa sponsorship can be provided)

If you care about real-time constraints, GPU architecture, and squeezing every last millisecond out of large models, this is worth a conversation.

All applicants will receive a response.

Location: San Francisco, CA
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 11/02/2026
Job ID: 34843

Applied Scientist – Vision Language Models (Multimodal Reasoning)

Ready to build VLMs that go beyond captioning and simple grounding?

This role is centred on advancing vision-language models that power intelligent agents operating in complex, real-world environments. The focus is firmly on multimodal model design, training, and post-training, with a mix of computer vision.

As an Applied Scientist, you’ll work on large multimodal models that integrate visual inputs with language-based reasoning. You’ll explore how VLMs can move from recognition and description toward structured understanding, task execution, and agentic decision-making.

Your work will include designing model architectures, improving cross-modal alignment, and developing post-training strategies that strengthen reasoning, factual consistency, and controllability. You’ll contribute across the full lifecycle, from data curation and supervised fine-tuning through to preference optimisation and evaluation.

This is a research-heavy role with clear production impact. You’ll prototype new ideas, run rigorous experiments, and collaborate with engineering teams to deploy models into live agent workflows.

Your focus will include:

  • Training and fine-tuning large-scale vision-language models
  • Improving multimodal alignment between image and text representations
  • Applying post-training techniques such as SFT, RLHF, DPO, and reward modelling
  • Designing evaluation frameworks for reasoning quality, grounding accuracy, and robustness
  • Working with large multimodal datasets, including synthetic and proprietary data

Hands-on work with VLMs or multimodal foundation models is essential. Experience in post-training, alignment, or preference learning is highly valued.

A solid understanding of how to evaluate multimodal systems, including hallucination, grounding failures, and reasoning gaps, is important. You should be comfortable reading and implementing recent research, and designing experiments that move models forward in measurable ways.

You’ll have ownership over modelling decisions and the opportunity to influence how multimodal intelligence is shaped within a fast-growing AI team.

Compensation: $200,000 - $320,000 base (negotiable depending on level) + bonus + meaningful equity + benefits

Location: SF Bay Area (Hybrid). Remote flexibility in the short term.

If you’re motivated by pushing vision-language models toward deeper reasoning and real-world capability, we’d like to speak with you!

All applicants will receive a response.

Location: United States
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 11/02/2026
Job ID: 33847

Research Engineer – Computer Vision & Machine Learning

Want to build vision systems that let machines understand the physical world as naturally as we do?

This role sits within a highly technical team developing a new class of computing devices where perception, language, and interaction are tightly integrated. Vision is a core capability. Your work will directly influence how machines see, reason about space, and collaborate with humans in real-world environments.

You’ll join a specialist vision group working across 3D computer vision and machine learning. The problems sit at the boundary between learned models and physical reality, including gaze tracking, SLAM, multi-camera geometry, and systems that explicitly model optics, refraction, and light transport. The focus is on geometry-aware, physically grounded approaches rather than purely pixel-driven modelling.

This is a hands-on research engineering role. You’ll move between reading papers, building and training models, designing datasets, running controlled experiments, and deploying onto real hardware. You’ll work closely with firmware and hardware teams to ensure models operate reliably on-device.

Your work will include:

  • Developing ML models across 3D perception, tracking, and spatial understanding

  • Designing model architectures, training pipelines, evaluation frameworks, and inference systems

  • Working with large-scale, multi-camera and sensor-rich datasets

  • Translating state-of-the-art research into robust, production-ready systems

  • Creating new approaches when existing methods do not meet performance or physical constraints

You’ll have genuine technical ownership. The team values clear thinking, strong experimental discipline, and the ability to make informed bets on promising ideas.

You’ll likely bring end-to-end experience building computer vision and ML models, alongside strong familiarity with modern research in 3D or geometry-aware vision. Hands-on experience with PyTorch or JAX is expected, as is comfort working with complex datasets. The ability to operate independently in ambiguous environments is important, as is clear communication across research, hardware, and product teams.

A Bachelor’s degree or higher in computer science, machine learning, computer vision, applied mathematics, or a related field is required. A Master’s or PhD is a plus, particularly if you’ve worked on geometry-aware or physically informed modelling approaches. Experience deploying ML systems into real products or working in high-ownership startup environments would be valuable.

Compensation: $190,000 - $320,000 base (depending on experience) + equity
Benefits: 401(k) matching, 100% employer-paid health, vision, and dental insurance, unlimited PTO and sick time, medical FSA matching
Location: San Francisco, on-site collaboration required

If you’re motivated by building geometry-aware vision systems that connect AI to the physical world in meaningful ways, we’d like to hear from you!

All applicants will receive a response.

Location: San Francisco, CA
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 11/02/2026
Job ID: 34942

Lead Evaluation Engineer — Speech & Multimodal Models

How do you measure whether an AI voice truly sounds real — and prove it with data?

You’ll join an AI team developing large-scale speech and multimodal systems for real-time interaction — models that generate, clone, and understand voice with natural expression and precision.

This is a founding evaluation role, in a new dedicated Evals team defining how these models are measured, improved, and deployed safely at scale. You’ll design objective and subjective evaluation pipelines, run large-scale human studies, and build automated systems that turn perception into measurable signal.

Your work will span every stage of model development — from research to production — collaborating with speech, audio, and ML teams to close the loop between modelling, feedback, and user experience.

What you’ll do:
• Build and scale evaluation pipelines for TTS, voice conversion, and ASR systems
• Design human studies for subjective testing (e.g. MOS, ABX)
• Define and implement objective metrics (WER, intelligibility, naturalness, prosody)
• Automate evaluation dashboards and reporting systems
• Train auxiliary models to capture new evaluation dimensions
• Collaborate across data, model, and product teams to drive measurable improvement
• Establish and scale the evaluation function as the team grows

You’ll bring:
• Strong experience building or running eval systems for speech or multimodal models
• Familiarity with ASR, TTS, or voice cloning pipelines
• Experience designing user studies or subjective model evaluation
• Solid understanding of statistics and experimental design
• Proficiency in Python and ML frameworks (PyTorch, Hugging Face, etc.)
• Strong communication skills and cross-functional mindset

Why this role:
This is a rare chance to build the evaluation foundation for models already deployed globally — shaping how next-generation speech systems are measured and improved. You’ll have the autonomy to define standards, lead future hires, and see your work directly impact millions of real-world interactions.

Fully remote (EU timezones preferrred), global team. Competitive salary + meaningful stock options.

The company are well funded, with a 9 figure funding round and significant runway for meaningful growth, lots of compute and hiring! 

Apply today. Everyone will get a response.

Location: Remote
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 04/02/2026
Job ID: 34414