Your search has found 40 jobs

Want to work on what comes after today’s LLMs?

Here you’ll join a frontier AI team focused on building towards superintelligence, not just scaling existing models, but improving how they reason, plan, and act over long horizons.

The work centres on training systems that go beyond next-token prediction. Models that can explore, make decisions, use tools, and improve through interaction with complex environments.

A big part of this is post-training. You’ll be working across SFT, reinforcement learning, and agent-based training loops, shaping how models behave rather than just how they score on benchmarks.

There’s also a strong focus on architecture and scale. The team are interested in those who work with mixture-of-experts systems and large models in the 30B+ to 100B+ range, with access to serious compute. This is the kind of environment where you will optimise, experiment, and iterate quickly.

You’ll be joining a small, research-heavy group with real autonomy, backed by a larger organisation with the resources to support ambitious work. Your impact on the development of core foundational work will be meaningful.

High ownership, fast pace, and a clear focus on pushing capability forward.

What you’ll bring:

- Experience working on large models (ideally 30B+ parameters)
- Exposure to MoE or large-scale distributed training
- Background in LLM post-training, RL, or agent systems
- Experience in a frontier lab or similarly ambitious environment

This is the type of environment many people at top labs leave to build themselves. The difference here is you get the backing, compute, and team already in place.

Salary: $350k - $450k Base DOE + Sizeable Stock Options and Bonuses
Location: SF hybrid or US Remote or London

If you’re thinking seriously about where frontier model development is heading next, this is worth a conversation.

All applicants will receive a response.

Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 30/04/2026
Job ID: 35932

Want to own how an AI product actually thinks at scale?

You’ll join a team building one of the largest conversational AI platforms globally, already used by 50M+ people and growing fast. This isn’t an API wrapper or a thin product layer. AI is the product.

You’ll take ownership of the core model behaviour, shaping how the system responds, adapts, and improves across millions of real conversations. That means working where model design meets product reality, where latency, cost, safety, and user experience all collide.

You’ll lead from the front. Still hands-on, still in the code, but responsible for the direction.

The work sits across post-training, inference, and system design. You’ll be making decisions that directly affect how users experience the product every day.

Your focus will include:

  • Owning LLM behaviour across a high-scale conversational system
  • Fine-tuning and adapting open-source models such as Llama, Mistral, and Qwen
  • Improving response quality, alignment, and conversational memory
  • Designing evaluation pipelines that reflect real user interactions, not just offline benchmarks
  • Optimising inference for latency, cost, and reliability at scale

You’ll also lead a small team, setting direction while staying close to implementation. This is not a step away from the work.

There’s real technical ownership here. You’ll define trade-offs across:

  • RAG versus fine-tuning approaches
  • Model selection and architecture decisions
  • Scaling strategies across compute, latency, and cost

You’ll likely have experience building and deploying LLM systems in production, not just experimenting. You understand how models behave in messy, real-world environments and how to improve them iteratively.

Background-wise, you might come from conversational AI, assistants, or agent-based systems. You’ve probably worked with post-training methods like LoRA, QLoRA, SFT, RLHF, or DPO, and you’re comfortable with modern tooling across PyTorch, Hugging Face, and inference stacks.

Why this role?

You’ll be working on a product with real usage at global scale. The feedback loop is immediate. Changes you make will impact millions of interactions.

The team moves quickly. Ideas are tested and shipped in days, not quarters. There’s minimal process overhead and a strong bias toward building.

You’ll also be operating in a product space that brings real complexity, including content moderation and safety challenges. It’s not a clean lab environment, it’s production AI with all the edge cases that come with it.

Package

Salary: ~$200,000 base + ~$80,000 equity
Location: Fully remote (global)
Type: Full-time (B2B or employment)

If you’re looking to own LLM systems at scale, technically and directionally, this is worth exploring.

All applicants will receive a response.

Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: USD $200,000.00
Job published: 30/04/2026
Job ID: 35800

Want to work on one of the hardest unsolved problems in voice AI — making it actually sound like a human conversation?

Most voice AI falls apart the moment a conversation gets messy. Someone interrupts, emotions shift, the flow breaks — and the model can't keep up.

A small, ambitious SF startup is tackling exactly these problems, building speech models that handle natural conversation the way humans actually experience it. They have a working prototype and early commercial traction across several high-profile industry verticals.

The role

As a Senior Research Scientist, your focus is post-training — curating data, fine-tuning pre-trained speech models, and building the evaluation infrastructure that validates it all. You'll work on large-scale models with access to significant data resources.

What you'll do

  • Shape the data that goes into post-training — sourcing, cleaning and structuring it for large speech models

  • Supervised fine-tuning of pre-trained speech models

  • Build evaluation workflows — automated and human-in-the-loop

  • Drive measurable improvements in hallucination rates, instruction-following and generalisation

What you'll bring

  • PhD in ML or related field with a strong publications record

  • Hands-on experience training large speech models — ASR, TTS, or speech-to-speech

  • Solid post-training and SFT experience

The founding team includes a founding engineer from a billion-dollar AI company where they co-created one of the first generative models in the field, alongside the co-creator of the first generative voice at one of the world's largest tech companies.

Compensation is between $400k-$500k base with generous equity.

Based in San Francisco, onsite. Relocation support for those in US and willing to make the move.

Location: San Francisco
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 30/04/2026
Job ID: 34047

ML Model Serving Engineer

Want to build the layer that actually makes AI usable in real time?

You’ll join a team focused on inference, where performance is the product. This is about delivering low-latency, high-throughput systems across LLMs, speech, and vision models running in production, not offline experiments.

They’re building real-time AI systems that need to respond instantly, reliably, and at scale. That means solving hard problems around batching, GPU efficiency, memory constraints, and system-level bottlenecks that most teams never fully crack.

You’ll sit at the core of the platform, working across model serving, infrastructure, and performance optimisation. A big part of the role is pushing current tooling beyond its limits, extending frameworks, profiling bottlenecks, and designing systems that hold up under real-world load.

This is not about training models. It’s about making them fast, efficient, and production-ready.

What you’ll work on:

  • Building high-performance serving systems for LLM, speech, and vision models
  • Scaling inference to production workloads with strict latency requirements
  • Optimising GPU utilisation and execution efficiency
  • Implementing techniques like continuous batching, KV cache optimisation, speculative decoding, and prefill/decode separation
  • Improving frameworks such as vLLM, TensorRT-LLM, Triton, and SGLang
  • Profiling and debugging performance across GPU, memory, and system layers

What you’ll bring:

  • Strong experience with ML inference or model serving systems
  • Deep understanding of latency and throughput optimisation in production
  • Solid Python and PyTorch skills, plus a systems or performance engineering mindset
  • Familiarity with distributed systems and production infrastructure

Exposure to CUDA, GPU profiling tools, or systems like Kubernetes and Ray is useful, but the key is knowing how to make models run efficiently at scale.

You’ll join a highly technical team with experience across major AI labs and big tech. The environment is pragmatic, focused on solving real performance problems rather than abstract research.

There’s real ownership here. You’ll help define how next-generation AI systems are served.

Package:
$220,000 – $320,000 base + equity
San Francisco, onsite 3 days per week

If you’re interested in working on the part of AI that actually determines whether it works in the real world, this is worth exploring.

All applicants will receive a response.

Location: San Francisco, CA
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 30/04/2026
Job ID: 34247

Most AI roles build on top of models.

This one builds what makes them actually work.

 

We’re hiring ML Infrastructure Engineers to tackle a hard, real-world problem, understanding what’s happening on live job sites using wearable devices, large-scale video, and AI.

 

This isn’t clean benchmark data.

 

It’s messy, continuous, real-world input flowing from device → edge → cloud, at scale.

 

You’ll be working across:

  • High-throughput video pipelines handling millions of hours of data
  • Training and inference systems for multimodal / LLM-based models
  • GPU infrastructure and performance optimisation
  • Hybrid environments spanning edge, on-prem, and cloud

 

The role is end-to-end. Ingestion through to deployment.

 

You’ll be building the systems that make applied AI viable outside the lab.

 

The team comes from top AI and infrastructure companies, with strong funding and a clear technical roadmap. This is a systems challenge as much as an ML one.

San Francisco (on-site)

$250k–$350k base + strong equity

 

If you’ve built ML or data infrastructure at scale and care about real-world constraints, this is worth a conversation.

 

All applicants will receive a response.

Location: San Francisco, CA
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 30/04/2026
Job ID: 35701

Senior C++ Engineer (AI Inference / Real-Time Systems)

Want to build AI systems where milliseconds actually matter?

You’ll join a team developing a next-generation surgical platform, combining real-time AI, advanced imaging, and tightly integrated hardware. This isn’t model training or offline experimentation. It’s about how AI behaves in the real world, under strict latency, reliability, and regulatory constraints.

You’ll work on low-latency inference systems running in tens of milliseconds, directly impacting how critical decisions are made in live environments.

This is systems engineering at its core.

You’ll design and optimise C++ pipelines that handle real-time data from cameras and sensors, ensuring models run predictably, efficiently, and safely. The work sits close to the hardware, close to the constraints, and close to the outcome.

There’s real ownership here. You’ll help shape the core inference framework as it evolves, not just contribute to it.

You’ll join a small, high-calibre team with experience across robotics, imaging, and AI systems. They’ve already secured strong funding and are building toward a regulated, production-ready platform.

What you’ll focus on:

  • Building real-time, multithreaded C++ pipelines for AI inference
  • Optimising latency, memory usage, and system performance across CPU and GPU
  • Working closely with imaging pipelines, cameras, and sensor data
  • Contributing to GPU-accelerated components where relevant (CUDA)
  • Developing production systems within a regulated (FDA) environment

What you’ll bring:

  • Strong experience with modern C++ (C++17/20+)
  • Background in real-time or latency-sensitive systems
  • Experience with multithreading, concurrency, and performance optimisation
  • Familiarity with AI inference or computer vision pipelines
  • Experience working in regulated environments (FDA, IEC 62304 or similar)

You might also have experience with CUDA, hardware integration, or medical imaging systems, but it’s not essential.

This is the kind of role where engineering decisions have real-world consequences. Performance, determinism, and reliability aren’t nice-to-haves, they’re fundamental.

Package:
Salary: $160,000 – $200,000 base
Bonus: 10–15%
Equity: Meaningful upside
Location: Remote (US & Europe)

If you’re interested in building real-time AI systems in environments where failure isn’t an option, it’s worth a conversation.

All applicants will receive a response.

 
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 30/04/2026
Job ID: 35734

Ready to architect the future of human-computer voice interaction?

Join an established conversational AI company as they transition from traditional cascaded speech systems to cutting-edge E2E speech-to-speech technology. You'll lead this transformation, building multimodal systems that will redefine how millions interact with AI.

The opportunity

You'll be leading the development of speech technology that directly impacts real users at massive scale. The company processes millions of daily interactions across major enterprise clients, meaning your research will shape real-world conversational experiences.

You'll spearhead the development of full-duplex speech systems, creating truly natural AI conversations that go far beyond current capabilities.

Your impact

  • Design and build next-generation multimodal speech LLM architecture from the ground up
  • Drive breakthroughs in speech-to-speech modeling and full-duplex conversation systems
  • Tackle turn-taking, interruption handling, and simultaneous speech processing
  • Bridge cutting-edge research with enterprise-grade production systems
  • Lead a growing team focused on SOTA speech-to-speech breakthroughs and own the development end-to-end

What you'll bring

  • Deep understanding of SOTA speech models and neural audio processing
  • Experience building speech language models/multimodal systems
  • Strong background in speech AI research and modern speech architectures

This is all underpinned by access to a large corpus of real enterprise conversational data and serious GPU infrastructure.

The company has built everything in-house, giving you complete technical control and the freedom to explore any approach that delivers value.

With their established market position and proven track record, you'll have the resources and real-world testing ground to make transformative impact with your research. 

Location

Remote (Must be within EU timezone).

Location: Remote
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 30/04/2026
Job ID: 33350

Want to build the speech and audio models that define how the next generation of voice AI actually sounds and listens?

A well-funded AI startup has developed new model architectures that make real-time conversational AI finally viable at scale. While most voice AI still suffers from delays and computational bottlenecks, they've solved the core efficiency problems that have held the field back.

The role

As their Senior Research Scientist, you'll build core speech foundation models that could define the next decade of voice interaction. You'll work on novel architectures that have immediate real-world impact for thousands of customers.

What you'll do

  • Design and implement SOTA speech foundation models

  • Develop efficient algorithms for speech processing and audio understanding

  • Create scalable systems that handle massive audio workloads

  • Build comprehensive evaluation methods to validate model performance

  • Collaborate with engineering teams to transition research into production

What you'll bring

  • Deep expertise in modern speech technologies (TTS, Speech LLMs, Voice Conversion/Cloning, Speech Translation, ASR, Audio Understanding)

  • Strong background in generative modelling for audio and speech

  • Publications at leading conferences

  • Track record of implementing research ideas from concept to production

You'll join a solid research team, including technical founders who've published work that's fundamentally shifted how the field thinks about efficient, large-scale foundation models. They're well-funded and generating strong revenue. Comp is on par with top AI labs, with base over $400k+ DOE plus a generous equity package.

The role is based in San Francisco, hybrid with 4 days a week in the office.

If you're excited about building the foundational models that will power the next generation of voice AI, we'd love to hear from you.

All applicants will receive a response.

Location: San Francisco
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 30/04/2026
Job ID: 33251

Research Engineer – Computer Vision & Machine Learning

Want to build vision systems that let machines understand the physical world as naturally as we do?

This role sits within a highly technical team developing a new class of computing devices where perception, language, and interaction are tightly integrated. Vision is a core capability. Your work will directly influence how machines see, reason about space, and collaborate with humans in real-world environments.

You’ll join a specialist vision group working across 3D computer vision and machine learning. The problems sit at the boundary between learned models and physical reality, including gaze tracking, SLAM, multi-camera geometry, and systems that explicitly model optics, refraction, and light transport. The focus is on geometry-aware, physically grounded approaches rather than purely pixel-driven modelling.

This is a hands-on research engineering role. You’ll move between reading papers, building and training models, designing datasets, running controlled experiments, and deploying onto real hardware. You’ll work closely with firmware and hardware teams to ensure models operate reliably on-device.

Your work will include:

  • Developing ML models across 3D perception, tracking, and spatial understanding

  • Designing model architectures, training pipelines, evaluation frameworks, and inference systems

  • Working with large-scale, multi-camera and sensor-rich datasets

  • Translating state-of-the-art research into robust, production-ready systems

  • Creating new approaches when existing methods do not meet performance or physical constraints

You’ll have genuine technical ownership. The team values clear thinking, strong experimental discipline, and the ability to make informed bets on promising ideas.

You’ll likely bring end-to-end experience building computer vision and ML models, alongside strong familiarity with modern research in 3D or geometry-aware vision. Hands-on experience with PyTorch or JAX is expected, as is comfort working with complex datasets. The ability to operate independently in ambiguous environments is important, as is clear communication across research, hardware, and product teams.

A Bachelor’s degree or higher in computer science, machine learning, computer vision, applied mathematics, or a related field is required. A Master’s or PhD is a plus, particularly if you’ve worked on geometry-aware or physically informed modelling approaches. Experience deploying ML systems into real products or working in high-ownership startup environments would be valuable.

Compensation: $190,000 - $320,000 base (depending on experience) + equity
Benefits: 401(k) matching, 100% employer-paid health, vision, and dental insurance, unlimited PTO and sick time, medical FSA matching
Location: San Francisco, on-site collaboration required

If you’re motivated by building geometry-aware vision systems that connect AI to the physical world in meaningful ways, we’d like to hear from you!

All applicants will receive a response.

Location: San Francisco, CA
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 30/04/2026
Job ID: 34942

Most AI systems work in demos. Very few hold up in real customer environments.

This team is building the decision-making systems behind AI agents that operate across voice, chat, and email — where performance is measured in outcomes, not benchmarks.

You’ll work on models that need to reason over time, handle multi-step workflows, and stay consistent across entire interactions. Not just once, but repeatedly, under real-world constraints.

This is applied research that ships. You’ll take ideas from early concept through to production, owning how systems behave when deployed at scale.

The challenge is not just capability. It’s reliability — making reasoning systems that can operate across long-context interactions, manage memory, use tools, and execute workflows without breaking down.

You’ll be working closely with product and engineering teams, iterating on real-world failures, and improving systems based on how they actually perform in production.


What you’ll work on

  • Designing and improving reasoning systems for real-world agent workflows
  • Building and refining memory, retrieval, and multi-step execution systems
  • Developing post-training and evaluation approaches for deployed models
  • Iterating on systems based on real user behaviour and performance
  • Taking research ideas through to production environments

What they’re looking for

  • Experience working on LLM systems in production
  • Background in RL, post-training, or agent-based systems
  • Experience building systems involving memory, reasoning, or tool use
  • Strong engineering fundamentals and ability to ship end-to-end systems
  • Clear understanding of how models behave outside of controlled environments

Why this role

  • Work on systems judged by real users, not offline metrics
  • Direct ownership of how models behave in production
  • High autonomy in a fast-moving, product-driven team
  • Real-world complexity, not sandboxed problems

Package

📍 San Francisco or London (on-site)
💰 $200K–$400K base + equity

All applicants will receive a response.

Location: SF, onsite
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 27/04/2026
Job ID: 35338