Your search has found 28 jobs

Want to build systems that actually hold up under long-running AI workloads?

Most agentic systems for science don’t fail at the model layer. They fail because the infrastructure can’t support long-horizon execution.

You’ll join a team building autonomous AI agents that run full research cycles. Ingesting thousands of papers, forming hypotheses, running experiments, and producing traceable outputs used by real scientific teams.

The challenge is making that work in production.

You’ll own the systems behind it. APIs, data pipelines, and platform architecture designed for long-running workloads, large-scale ingestion, and iterative experimentation loops. This is full-stack in scope, but backend in depth, where system design decisions directly impact what the platform can do.

You’ll be working across:

  • Backend services in Python or Node, building scalable APIs (FastAPI/REST)
  • Data pipelines supporting agent execution and scientific workflows
  • Cloud infrastructure (AWS/GCP), containerisation (Docker, Kubernetes)
  • CI/CD, observability, and reliability for systems under continuous load

This isn’t a generalist full-stack role. You’ll need to understand how systems behave under heavy data and compute demands, and be comfortable making architectural trade-offs across distributed systems.

The team is small, high-calibre, and already running real workloads with revenue traction. Backed by $70M+, they’re building infrastructure that defines how AI is applied to scientific discovery.

 

Salary: $200,000–$350,000 + equity
Location: San Francisco (onsite)

Location: San Francisco, CA
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 30/03/2026
Job ID: 35569

Senior Applied Researcher

Want to build vision-language models that understand complex, real-world environments?

You’ll join a small, highly technical team working on foundational problems in multimodal AI, focused on training models that can interpret, reason, and act on large-scale first-person video data.

You’ll work directly with the Chief Science Officer, shaping how models are designed, trained, and evaluated. The work sits at the intersection of VLMs, long-context reasoning, and real-world deployment.

The focus is on building systems that move beyond static perception, towards temporal understanding, activity recognition, and higher-level reasoning across dynamic environments.

Your work will centre on:

  • Designing and training VLMs on large-scale video datasets
  • Developing post-training approaches including SFT, RLHF, and parameter-efficient tuning
  • Building scalable training and evaluation pipelines
  • Exploring long-context and temporal modelling
  • Designing efficient systems across edge and server-side inference
  • Defining benchmarks for spatial and behavioural understanding

You’ll bring strong experience training deep learning models, ideally transformer-based, along with hands-on work in vision, language, or multimodal systems.

Experience with large datasets, model optimisation, or deploying models into production environments will be valuable. Exposure to video data or long-context modelling is particularly relevant.

This is a team that values speed, ownership, and first-principles thinking. You’ll be working on open-ended problems with real-world impact, with the freedom to explore and define approaches.

Compensation: Highly competitive salary + equity
Location: San Francisco, onsite

If you’re interested in building multimodal systems that operate in real-world settings, and want to join a well-funded, highly skilled research team, please apply now!

All applicants will receive a response.

Location: San Francisco, CA
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 30/03/2026
Job ID: 35437

Ready to own the data pipeline powering the voice of the next generation of AI characters?

You'll be joining a well-funded startup building AI character technology, where speech is a core part of the product experience.

Think super natural conversations, handling interruptions, personality shifts and more!

You'll own the datasets that power their speech systems — from raw, messy audio through to clean, versioned training corpora that directly drive TTS and ASR model performance.

Your focus

  • Own the full data lifecycle — defining specs, auditing and curating large-scale audio and text corpora
  • Build automated quality metrics and dashboards across SNR, VAD, WER, speaker verification and safety, validated against listening tests
  • Train and deploy lightweight classifiers for noise detection, diarisation, language ID, and content moderation

What you'll bring

  • Deep experience working with speech and audio data at scale — 1M+ hours
  • Strong ML engineering skills in Python and PyTorch, including training and fine-tuning models like Whisper or Wav2Vec
  • Practical knowledge of audio processing — torchaudio, librosa, spectrograms, DSP basics
  • A solid understanding of audio quality metrics — MOS, WER, PESQ/STOI, SNR, speaker verification

Nice to have

  • Experience with Spark/Beam, Airflow, SQL or similar data engineering tools
  • Open-source contributions or publications in speech or audio ML
  • Background in denoising and enhancement, and how it affects downstream model quality

Remote, with a preference for European or overlapping timezones. Competitive compensation and equity.

Location: Remote
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 27/03/2026
Job ID: 34412

Want to build the systems that make AI agents actually work in production?

Most agents fail outside controlled environments, not because the models are weak, but because the systems around them can’t represent how real work happens.

This team is building that missing layer...

Their platform sits inside enterprise workflows, capturing how tasks are executed across tools, then structuring that data so models and agents can actually use it. Real operational context, not synthetic benchmarks.

As a Full Stack Engineer, you’ll focus on the backend and product systems that make this usable in production.

You’ll design workflow data models, build high-throughput pipelines, and ship full-stack features used by real customers. This sits across distributed systems, data engineering, and LLM integrations.

Tech stack includes TypeScript (NestJS, React, Vite, TanStack), PostgreSQL, and AWS/GCP, with OpenAI and Anthropic models integrated into core systems.

You’ll join a small, highly technical, Accel-backed team that’s already post-revenue and scaling with enterprise customers. This isn’t speculative infrastructure, it’s being used.

Experience with Python pipelines, Terraform monorepos, or Rust/Swift is useful, but not essential.

What matters is your ability to build systems that hold up in real-world complexity.

📍 San Francisco (on-site)
 💰 $160K–$280K base + equity + additional comp

If you’re interested in building the layer that makes AI agents usable, this is where that work is happening.

All applicants will receive a response.

Location: San Francisco, CA
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 19/03/2026
Job ID: 35404

Advance how agents and LLMs learn from feedback in realistic environments.

If you've been working at the intersection of reinforcement learning and large language models, this is an opportunity to work on the foundations of how AI systems are trained, evaluated, and supervised, with your research shipping into production.

You will work hands-on on fundamental problems spanning LLM post-training, RL simulation environments, and agentic evaluation, shaping core methods and benchmarks used by leading AI labs and enterprises around the world.

The team actively publishes and collaborates with external research labs, with recent work appearing at ACL and NeurIPS. You'll see your ideas move from concept to deployed systems, working alongside engineers who build fast and take research seriously.

This is a research-driven company growing quickly due to real demand for what they're building. If you want your work to matter, both in the literature and in production, this is where to do it.

You'll bring hands-on experience in applied research across RL, LLM post-training, or agent-based systems, with a strong understanding of transformer architectures and fine-tuning. As important as the theory is the ability to ship — you can translate research ideas into production-ready systems that actually work. A track record of publishing at top-tier venues such as NeurIPS, ICML, ACL, or EMNLP is a plus, but what matters most is the quality of your thinking and your ability to execute.

What you'll do

  • Conduct research on LLM post-training methods (RLHF, RLAIF, RLVR)
  • Design and build realistic RL simulation environments for agents
  • Develop agentic evaluation and supervision frameworks
  • Create and maintain benchmarks for emerging AI capabilities
  • Collaborate with engineers to take research from idea to deployed systems

Location — San Francisco · Salary — Up to $300k base + equity, flexible and negotiable DOE

All applicants will receive a response.

 

Location: San Francisco
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 12/03/2026
Job ID: 33119
Rip up the playbook and step into uncharted territory.

If you've been building long-horizon multi-agent systems and pushing the boundaries of AI research, this is the kind of role where curiosity and ambition meet real execution, exploring truly novel problems at the frontier of what's currently possible.

You will work on systems designed to outperform the current state of the art, tackling problems that don't yet have standardised solutions across RL, long-horizon reasoning, LLM post-training for non-myopic objectives, environment and feedback design.

Whether you're early-career PhD or highly experienced, what matters most is your ability to push novel ideas into working systems, execute your knowledge across reasoning, RL and memory to make real-world impact.

This is a small, ambitious team operating where few others are, building and executing quickly in areas such as computational R&D science. This is your opportunity to shape the systems that generate and validate new discovery in environment primed for success. 

Skills & experience
  • PhD and/or publications at top conferences across long-horizon reasoning, RL, or similar
  • Post-training experience (RLHF, DPO, reward modelling)
  • Experience working on open-ended research 
Location- San Francisco
Salary- $400k base 0.5–1%+ equity Negotiable DOE

All applicants will receive a response. 
Location: San Francisco, CA
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 12/03/2026
Job ID: 35041

 

The Bot Company

We're building a helpful robot for every home.

We're a small team of engineers, designers, and operators based in San Francisco. Our team comes from Tesla, Cruise, OpenAI, Google, Pixar, and many other great companies. In the past we've shipped to hundreds of millions of users and know what it takes to build amazing products and experiences.

Our team is deliberately lean to promote rapid decision making and do away with bureaucracy and hierarchy. Everyone is an IC and is empowered with massive scope, radical ownership, and direct responsibility. We work across the stack with a culture built for rapid iteration and fast execution.

 

What we look for in all candidates

All roles at The Bot Company demand extreme sharpness and the ability to move fast in high-intensity environments. Throughout the process, we expect candidates to demonstrate:

• Exceptional mental acuity: you think quickly, learn instantly, and reason across unfamiliar domains.

• Engineering curiosity: you naturally dig into how systems work, even outside your specialty.

• High performance mindset: you move fast, handle ambiguity, and excel when the environment is demanding.

 

Machine Learning: World Models

We are building neural simulators that understand the grammar of the physical world—including physics, causality, and long-term dynamics.

This role focuses on pushing video generation systems beyond short clips into controllable, large-scale world models that can simulate environments, actions, and interactions over long time horizons. These models will serve as the foundation for robotic intelligence, enabling robots to reason about the future, anticipate consequences, and learn from simulated experience.

You will work on the frontier of generative modeling and large-scale training to build spatiotemporal models capable of learning coherent world dynamics.

 

 

 

What You’ll Do

• Architect Neural Simulators: Design and train large-scale spatiotemporal models capable of learning long-horizon dynamics and physical interactions.

• Build World Models: Develop controllable video generation systems that evolve from short clips into coherent, persistent simulations of the real world.

• Scale Training: Train and optimize multi-billion parameter models across massive GPU clusters.

• Own the Training Loop End-to-End: Design, run, debug, and iterate on large- scale training experiments—diagnosing failure modes, improving data mixtures, and refining evaluation.

• Push Model Architecture Forward: Develop novel approaches for scaling temporal coherence, memory, and controllability in generative models.

• Work Across the Stack: Collaborate with infrastructure, robotics, and autonomy teams to integrate world models into broader robotic intelligence systems.

 

Requirements

• Very strong coding skills in Python, C++, or Rust.

• Video Generation Expertise: deep experience building or researching high- fidelity video generation systems.

• Architectural Intuition: ability to design novel model architectures and reason about scaling laws, emergent behavior, and failure modes.

• Infrastructure Fluency: comfortable managing large-scale experiments across massive GPU clusters.

• Strong understanding of modern generative modeling techniques (diffusion, transformers, autoregressive models, or related approaches).

 

Why Join

You’ll work with a small, elite team on challenges that require speed, intelligence, and deep engineering instinct. If you enjoy understanding systems at all levels, move fast, and think even faster, you’ll thrive here.

 

 

 

Location: San Francisco Employment Type: Full time Location Type: On-site Department: Engineering / Software

Compensation: Base $200K – $350K Actual compensation will depend on skills, experience, and qualifications.

Base salary is one part of the total compensation package. The role is also eligible for equity through the company’s discretionary equity program, along with a comprehensive benefits package that includes medical, dental, and vision coverage, and access to a 401(k) plan.

 

 

Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 09/03/2026
Job ID: 35305

Want to own the inference layer behind millions of real-world voice AI interactions every day?

You’ll join a profitable, founder-led enterprise conversational AI company powering billions of interactions annually across 30+ languages. Their systems sit behind major global brands and handle millions of customer conversations daily.

They’re now moving toward end-to-end multimodal and speech-to-speech architectures. You’ll own the inference stack powering both their multimodal speech-text LLM and their text reasoning LLM.

This goes well beyond tuning configs.

You will:

• Optimise production inference across A10, A100 and H100 GPUs
• Own scheduler design, KV cache allocation and batching logic
• Build serving systems tailored to multimodal audio-text workloads
• Support agentic, multi-step reasoning under real latency constraints
• Profile kernel-level bottlenecks and fix them properly

You’ve modified inference framework internals before, not just used them. You’re comfortable in Python and C++, and you’re happy diving into CUDA graphs, memory bandwidth limits or custom kernels when required.

This platform processes over 2 million interactions per day. Latency, throughput and cost are production realities, not lab metrics.

Package: €150,000 base + bonus + stock options
Location: Remote within Europe

If you want full ownership of inference performance at real enterprise scale, let’s talk.

All applicants will receive a response.

Location: Remote
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 03/03/2026
Job ID: 35239

New York City
$200,000–$250,000 + equity

Define how AI agents actually learn in production.

This team is building the foundational learning framework behind enterprise AI systems.

Not prompt wrappers.
Not repeated fine-tuning.

A system that formalises how work gets done and allows agents to improve continuously in real environments.

You’ll design architectures that turn operational behaviour into structured, executable intelligence — making knowledge compound over time through reasoning loops, persistent memory, and human-in-the-loop feedback, without degrading performance.

You’ll work directly with experienced founders and live enterprise customers on problems where reasoning, context, and workflow execution intersect.


What you’ll work on

  • Expanding the core learning framework that governs how agents improve

  • Designing structured context and memory layers

  • Building reasoning loops and feedback systems

  • Creating continuous learning pipelines from live operational data

  • Shipping production-grade Python systems into real deployments


What you’ll bring

  • Experience building non-trivial LLM systems in production

  • Designed agentic workflows involving reasoning, memory, and tool use

  • Strong Python engineering and systems thinking

  • Clear ownership of end-to-end AI systems


The company

  • Series A backed by Sequoia ($28M)

  • Platform approaching one trillion tokens processed

  • Major enterprise customers live

  • Small, engineering-led team building the learning layer enterprise AI will depend on


Everyone will receive a response.

Location: NYC
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 01/03/2026
Job ID: 35206

Training builds capability. Post-training decides what it becomes.

This team are rethinking how large multimodal models learn after pre-training — developing post-training and reinforcement learning methods that help models reason, plan, and interact in real time.

Founded by the researchers behind several of the most influential modern AI architectures, this lab are pushing alignment and learning efficiency beyond standard RLHF. They’re scaling preference-based training (RLHF, DPO, hybrid feedback loops) to new model types and creating systems that learn from interaction rather than static data.

You’ll work at the intersection of post-training, RL, and model architecture — designing reward models, scalable evaluation frameworks, and training strategies that make large-scale learning measurable and reliable. It’s applied research with direct impact, supported by serious compute and a tight researcher-to-GPU ratio.

You’ll bring experience in large-scale post-training or reinforcement learning (RLHF, DPO, or SFT pipelines), a solid grasp of LLM or multimodal training systems, and the curiosity to explore new optimisation and alignment methods. A publication record at top venues (NeurIPS, ICLR, ICML, CVPR, ACL) is a plus, but impact matters more than titles.

The team are based in San Francisco, working mostly in person. $1 million+ total compensation. Base salary circa $300K – $600K (negotiable) plus stock and bonus — exact package depends on experience.

If you want to work where post-training meets architecture — shaping how foundation models learn, reason, and adapt — this is that opportunity.

All applicants will receive a response.

Location: San Francisco
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 11/02/2026
Job ID: 34012