Your search has found 14 jobs

ML Model Serving Engineer

Want to build the layer that actually makes AI usable in real time?

You’ll join a team focused on inference, where performance is the product. This is about delivering low-latency, high-throughput systems across LLMs, speech, and vision models running in production, not offline experiments.

They’re building real-time AI systems that need to respond instantly, reliably, and at scale. That means solving hard problems around batching, GPU efficiency, memory constraints, and system-level bottlenecks that most teams never fully crack.

You’ll sit at the core of the platform, working across model serving, infrastructure, and performance optimisation. A big part of the role is pushing current tooling beyond its limits, extending frameworks, profiling bottlenecks, and designing systems that hold up under real-world load.

This is not about training models. It’s about making them fast, efficient, and production-ready.

What you’ll work on:

  • Building high-performance serving systems for LLM, speech, and vision models
  • Scaling inference to production workloads with strict latency requirements
  • Optimising GPU utilisation and execution efficiency
  • Implementing techniques like continuous batching, KV cache optimisation, speculative decoding, and prefill/decode separation
  • Improving frameworks such as vLLM, TensorRT-LLM, Triton, and SGLang
  • Profiling and debugging performance across GPU, memory, and system layers

What you’ll bring:

  • Strong experience with ML inference or model serving systems
  • Deep understanding of latency and throughput optimisation in production
  • Solid Python and PyTorch skills, plus a systems or performance engineering mindset
  • Familiarity with distributed systems and production infrastructure

Exposure to CUDA, GPU profiling tools, or systems like Kubernetes and Ray is useful, but the key is knowing how to make models run efficiently at scale.

You’ll join a highly technical team with experience across major AI labs and big tech. The environment is pragmatic, focused on solving real performance problems rather than abstract research.

There’s real ownership here. You’ll help define how next-generation AI systems are served.

Package:
$220,000 – $320,000 base + equity
San Francisco, onsite 3 days per week

If you’re interested in working on the part of AI that actually determines whether it works in the real world, this is worth exploring.

All applicants will receive a response.

Location: San Francisco, CA
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 30/04/2026
Job ID: 34247

Research Engineer – Computer Vision & Machine Learning

Want to build vision systems that let machines understand the physical world as naturally as we do?

This role sits within a highly technical team developing a new class of computing devices where perception, language, and interaction are tightly integrated. Vision is a core capability. Your work will directly influence how machines see, reason about space, and collaborate with humans in real-world environments.

You’ll join a specialist vision group working across 3D computer vision and machine learning. The problems sit at the boundary between learned models and physical reality, including gaze tracking, SLAM, multi-camera geometry, and systems that explicitly model optics, refraction, and light transport. The focus is on geometry-aware, physically grounded approaches rather than purely pixel-driven modelling.

This is a hands-on research engineering role. You’ll move between reading papers, building and training models, designing datasets, running controlled experiments, and deploying onto real hardware. You’ll work closely with firmware and hardware teams to ensure models operate reliably on-device.

Your work will include:

  • Developing ML models across 3D perception, tracking, and spatial understanding

  • Designing model architectures, training pipelines, evaluation frameworks, and inference systems

  • Working with large-scale, multi-camera and sensor-rich datasets

  • Translating state-of-the-art research into robust, production-ready systems

  • Creating new approaches when existing methods do not meet performance or physical constraints

You’ll have genuine technical ownership. The team values clear thinking, strong experimental discipline, and the ability to make informed bets on promising ideas.

You’ll likely bring end-to-end experience building computer vision and ML models, alongside strong familiarity with modern research in 3D or geometry-aware vision. Hands-on experience with PyTorch or JAX is expected, as is comfort working with complex datasets. The ability to operate independently in ambiguous environments is important, as is clear communication across research, hardware, and product teams.

A Bachelor’s degree or higher in computer science, machine learning, computer vision, applied mathematics, or a related field is required. A Master’s or PhD is a plus, particularly if you’ve worked on geometry-aware or physically informed modelling approaches. Experience deploying ML systems into real products or working in high-ownership startup environments would be valuable.

Compensation: $190,000 - $320,000 base (depending on experience) + equity
Benefits: 401(k) matching, 100% employer-paid health, vision, and dental insurance, unlimited PTO and sick time, medical FSA matching
Location: San Francisco, on-site collaboration required

If you’re motivated by building geometry-aware vision systems that connect AI to the physical world in meaningful ways, we’d like to hear from you!

All applicants will receive a response.

Location: San Francisco, CA
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 30/04/2026
Job ID: 34942

Most AI systems work in demos. Very few hold up in real customer environments.

This team is building the decision-making systems behind AI agents that operate across voice, chat, and email — where performance is measured in outcomes, not benchmarks.

You’ll work on models that need to reason over time, handle multi-step workflows, and stay consistent across entire interactions. Not just once, but repeatedly, under real-world constraints.

This is applied research that ships. You’ll take ideas from early concept through to production, owning how systems behave when deployed at scale.

The challenge is not just capability. It’s reliability — making reasoning systems that can operate across long-context interactions, manage memory, use tools, and execute workflows without breaking down.

You’ll be working closely with product and engineering teams, iterating on real-world failures, and improving systems based on how they actually perform in production.


What you’ll work on

  • Designing and improving reasoning systems for real-world agent workflows
  • Building and refining memory, retrieval, and multi-step execution systems
  • Developing post-training and evaluation approaches for deployed models
  • Iterating on systems based on real user behaviour and performance
  • Taking research ideas through to production environments

What they’re looking for

  • Experience working on LLM systems in production
  • Background in RL, post-training, or agent-based systems
  • Experience building systems involving memory, reasoning, or tool use
  • Strong engineering fundamentals and ability to ship end-to-end systems
  • Clear understanding of how models behave outside of controlled environments

Why this role

  • Work on systems judged by real users, not offline metrics
  • Direct ownership of how models behave in production
  • High autonomy in a fast-moving, product-driven team
  • Real-world complexity, not sandboxed problems

Package

📍 San Francisco or London (on-site)
💰 $200K–$400K base + equity

All applicants will receive a response.

Location: SF, onsite
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 27/04/2026
Job ID: 35338

What if AI systems could run full research loops — not just generate outputs, but form hypotheses, design experiments, and produce new scientific insight?

This team is building autonomous AI scientists that do exactly that. Their systems ingest large bodies of scientific literature, reason across them, and generate traceable outputs already used by teams in life sciences.

The problem is no longer getting models to produce plausible answers. It’s pushing them to plan, explore, and iterate across complex domains — reliably, and at scale.

You’ll join a team working at the edge of this shift, developing models that move beyond instruction following into structured, multi-step scientific reasoning.

This is not research in isolation. Your work will be deployed into real systems used by scientists, where model behaviour directly impacts what the platform can discover.

You’ll work closely with engineers and domain experts across biology and chemistry, translating open-ended problems into systems that can be trained, evaluated, and improved in production.

The company originated from one of the earliest groups working seriously on AI for science, including early language agents and AI-generated discoveries They’re now pushing further with systems capable of long-horizon reasoning across huge amounts of data. 

They’ve primarily focused on post-training and reasoning so far, and are now moving into pre-training their own models to support this end-to-end.


What you’ll work on

  • Developing models that can reason across long-horizon scientific problems
  • Designing post-training methods to improve multi-step decision making
  • Working on sampling, exploration, and evaluation in complex environments
  • Building systems that move from research ideas into production workflows
  • Collaborating with scientists to define problems and validate outputs

What they’re looking for

  • Strong background in machine learning research (RL, representation learning, or related areas)
  • Experience pre-training or post-training LLMs
  • Track record of applying ML to real-world or complex domains
  • Strong programming skills (PyTorch, JAX, or similar)
  • Ability to work across research and applied systems

Why this role

  • Work on systems that aim to automate scientific discovery
  • Direct impact on real-world research and outcomes
  • Small, high-calibre team across AI and science
  • Real traction, not just prototypes

Package

📍 San Francisco (on-site or hybrid). Other locations considered: NYC, London.
💰 $200K–$400K base + stock

All applicants will receive a response.

Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 27/04/2026
Job ID: 35536

Want to build the systems that make AI actually useful inside real companies?

This Series A startup is tackling one of the hardest problems in enterprise AI. Models are generic. Company processes aren’t. They’re building AI agents that learn how work actually happens, then run those operations end-to-end.

Backed by top-tier investors, they’ve built a deeply technical team across engineering, AI research, and strategy. The focus is simple. Build things properly, with people who care about the craft.

You’ll join as an early full stack engineer, shaping both the product and the foundations it’s built on.

The work sits around the models, not inside them. You’ll build the platform, workflows, and interfaces that make AI usable in real-world environments. That means designing systems that are reliable, observable, and genuinely pleasant to work with, both for users and other engineers.

There’s no separation between building and shipping here. You’ll take ideas from whiteboard to production, owning the outcome end to end. The bar is high, but so is the autonomy.

You’ll spend time designing clean API contracts, modelling data properly, and building frontends that don’t fight you six months later. Velocity matters, but not at the expense of quality.

Your focus will include:

  • Designing and building backend systems in Python using FastAPI, from API design through to database schema and infrastructure
  • Creating high-quality frontend experiences in TypeScript and React, with strong typing and clean component architecture
  • Building shared libraries, internal tooling, and component systems that improve how the whole team ships
  • Owning problems end to end, from shaping ambiguous requirements through to production deployment
  • Developing integrations, connectors, and data pipelines that tie the platform into external systems

You’ll also have real input into how the product evolves, working closely with design and product to understand how customers use what you build.

This is greenfield work. The decisions you make now will compound over time.

They’re looking for engineers who care about how things are built, not just that they work.

  • You enjoy writing Python and TypeScript to a high standard, with strong typing and clear structure
  • You think carefully about data models and take pride in getting schema design right
  • You’ve built libraries, SDKs, or internal tooling that improved developer experience
  • You’re comfortable owning problems end to end, even with ambiguity
  • You have good product instinct and care about how things feel to use

Experience-wise, around 3+ years is a useful guide, but what matters more is how you think and build.

You’ll join a small, high-calibre team where you can influence tooling, patterns, and technical direction from day one.

Compensation: Up to $250,000 base + equity
Location: New York (in-person)

If building foundational systems properly, with real ownership, sounds like your kind of environment, it’s worth a conversation.

Location: NYC
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 27/04/2026
Job ID: 35899

Want to build product experiences for AI agents that actually understand how companies operate?

This team is tackling a core limitation in enterprise AI. Models are general, but workflows are not. They’re building agents that learn how processes really run, then execute them. This is not surface-level UI work, you’re building the interface layer to systems that directly operate inside real business workflows.

They’re a Series A company backed by Sequoia, with a deeply technical team across engineering and AI. As an early frontend engineer, you won’t just build features, you’ll shape how the product feels, how it’s structured, and how other engineers build on top of it.

You’ll work primarily in TypeScript and React, building high-quality, user-facing experiences that sit on top of complex AI systems. Strong typing, clean abstractions, and thoughtful API design matter here. This is a team that values well-modelled systems over quick fixes.

There’s no separation between building and shipping. You’ll take ideas from concept to production, owning decisions across architecture, UX, and implementation. You’ll work closely with design, contributing to interface decisions and helping define how users interact with agent-driven workflows.

You’ll also have real influence on frontend architecture, tooling, and patterns. Whether that’s building a component library, shaping state management decisions, or improving how the frontend integrates with backend systems, your decisions will compound as the team scales.

What you’ll focus on:

  • Building production-grade frontend applications using TypeScript and React
  • Designing and contributing to frontend architecture, patterns, and component systems
  • Collaborating closely with design to shape UX and interface decisions
  • Owning features end-to-end, from scoping through to deployment
  • Debugging across the stack, including tracing issues into API and backend layers

What you’ll bring:

  • Strong experience with TypeScript, with a focus on well-typed, maintainable code
  • Solid React fundamentals, building scalable and performant interfaces
  • Experience designing APIs, data models, or internal tooling that improves developer workflows
  • Good product and interaction judgement, comfortable working closely with design
  • Comfort owning ambiguous problems and turning them into clear, deliverable solutions

You’ll likely have around 3+ years in software engineering, but what matters more is how you think about systems. If you enjoy building from scratch, care about clean abstractions, and take pride in code that other engineers enjoy working with, you’ll fit well here.

Bonus if you’ve worked with Next.js, built design systems, or developed libraries and SDKs. But strong fundamentals in React and TypeScript are the priority.

This is a frontend-focused role, but you’ll be expected to understand the wider system. You should be comfortable debugging issues beyond the UI when needed.


Comp: Base Salary up to $250,000 + equity
Location: New York (Also growing in London)

If you’re motivated by building thoughtful systems, not just shipping features, this is the kind of environment where your work compounds over time.

Location: NYC
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 27/04/2026
Job ID: 35866

Want to build the infrastructure that makes AI agents actually work inside real companies?

AI models are powerful, but they’re generic. Enterprise workflows aren’t. This team is solving that gap, building a learning layer that turns messy internal context into structured, executable systems that AI agents can actually use.

You’ll join a deeply technical team working on a platform that learns from tickets, Slack, emails, logs, and knowledge bases, then converts that into versionable “skills” for AI. Think of it as a “GitHub for context”, a system that makes company knowledge readable, maintainable, and executable.

This isn’t model training. It’s everything that makes models useful in production.

You’ll design and build the backend systems that power this layer, APIs, data models, integrations, and tooling that connect into real enterprise environments like ServiceNow, Jira, Zendesk, and Salesforce. The platform is already operating at serious scale, processing vast amounts of operational data across large organisations.

The work is high ownership. You won’t be handed tickets. You’ll take problems from idea to production, shaping architecture, building systems, and seeing how they perform in real-world use.

Your focus will include:

  • Building backend systems in Python (FastAPI), from API design through to database schema
  • Creating integrations, connectors, and data pipelines across enterprise tools
  • Developing internal tooling and libraries that improve engineering velocity
  • Owning systems end-to-end, including deployment and observability

You’ll enjoy this if you care about how software is built. Strong typing, clean interfaces, and well-structured data models aren’t afterthoughts here, they’re core to how the team works.

You’re likely someone who takes pride in designing schemas properly, enjoys building systems other engineers rely on, and prefers thoughtful, robust solutions over quick fixes.

The company has raised a $28M Series A led by Sequoia and is already working with large enterprise environments, processing data at significant scale. It’s still early enough that your decisions will shape the platform and engineering culture long-term.

Package:

Comp: $190K - 250K + meaningful equity
Location: New York (also expanding in London)

If you’re interested in building the systems that make AI actually usable in the real world, this is worth exploring.

Location: NYC
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 27/04/2026
Job ID: 35833

Define how large-scale AI systems for scientific discovery are actually built, trained, and run in production.

This team is building autonomous AI scientists that run full research loops — ingesting large bodies of literature, forming hypotheses, designing experiments, and producing traceable outputs already used across biotech and pharma.

The challenge isn’t just model capability. It’s building the systems that allow these models to be trained, evaluated, and deployed reliably at scale.

You’ll sit at the intersection of model training and systems — owning the infrastructure, pipelines, and experimentation platforms that make long-horizon reasoning systems possible.

This is not research in isolation. It’s building the engine that research runs on.

You’ll work closely with the wider team, translating ambiguous scientific problems into systems that can be trained, iterated on, and deployed in real-world environments.

The company comes from one of the earliest groups working seriously on AI for science, including early language agents and AI-generated biological discoveries. They’re now pushing further with systems capable of reasoning across thousands of papers and large-scale analyses, and moving toward pre-training their own models end-to-end.

The platform is already operating at scale, with tens of thousands of users and millions of queries, and is actively used in scientific workflows today.


What you’ll work on

  • Building and scaling training pipelines for large-scale LLM systems
  • Developing experimentation platforms that enable fast, reliable iteration
  • Designing data pipelines and systems for observability and reproducibility
  • Improving how training runs are orchestrated, monitored, and debugged
  • Supporting model deployment and inference for complex reasoning systems
  • Working closely with researchers to translate ideas into production systems

What they’re looking for

  • Experience building and scaling ML systems in production
  • Strong background across model training, data pipelines, and deployment
  • Experience with large-scale training or distributed systems
  • Fluency in frameworks like PyTorch, JAX, or similar
  • Strong engineering fundamentals and systems thinking
  • Ability to operate across ambiguity and own problems end-to-end

The company

  • ~$70M raised, with another round planned
  • Platform already at meaningful scale (tens of thousands of users, hundreds of millions of lines of code written by the agent)
  • Strong commercial traction 
  • Small, high-calibre team working at the intersection of AI and science

📍 San Francisco (on-site or hybrid, remote considered case by case)
💰 $250K–$400K base + equity
Levels: Senior, Staff, Principal
Roles available: ML Engineer, ML Infra, Research Engineers & Research Scientists 

All applicants will receive a response.

Location: San Francisco, CA
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 16/04/2026
Job ID: 35767

Want to build the interface layer for an AI scientist?

You’ll join a team building autonomous AI agents designed to accelerate scientific discovery. The goal is simple, science moves too slowly, and they’re building systems that can change that.

This isn’t a typical frontend role. The product is an integrated research environment where scientists interact directly with AI models, workflows, and generated insights. Your work defines how usable that system actually is.

You’ll sit within the Platform team, working closely with researchers and product to turn complex, often messy scientific workflows into clear, intuitive interfaces.

The challenge is translating depth into clarity without losing fidelity.

You’ll be building high-performance frontend systems where data density, responsiveness, and usability all matter. Real-time interactions, dynamic visualisations, and scalable UI patterns are core to the product.

Your focus will include:

  • Building performant React applications for data-heavy workflows
  • Designing interfaces for real-time AI interactions and streaming data
  • Creating modular, scalable design systems used across the platform
  • Translating scientific and model outputs into usable visual interfaces

You’ll need strong frontend fundamentals, but more importantly, the ability to think in systems. Understanding how users navigate complexity, how interfaces guide decision-making, and how performance impacts usability at scale.

There’s a strong emphasis on performance engineering. You’ll be profiling rendering behaviour, optimising asset loading, and ensuring smooth interaction across browsers and devices.

The product itself sits at the intersection of AI, biology, and research tooling. If you’ve worked on complex internal tools, data platforms, or visualisation-heavy applications, this will feel familiar, just at a deeper technical level.

You’ll likely have experience building production frontend systems with React (or similar), working with TypeScript, and handling real-time data flows such as WebSockets or GraphQL subscriptions. Experience with visualisation libraries like D3, Deck.gl or Three.js is highly relevant here.

The environment is highly collaborative. You’ll work closely with researchers to anticipate how the product should evolve, not just respond to specs.

This is an onsite role based in San Francisco, working with a team focused on building something that genuinely pushes forward how science gets done.

Salary: $175,000 – $240,000 + equity
Location: San Francisco, onsite

If you’re interested in shaping how scientists interact with AI systems, apply today.

Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 01/04/2026
Job ID: 35602

Want to build systems that actually hold up under long-running AI workloads?

Most agentic systems for science don’t fail at the model layer. They fail because the infrastructure can’t support long-horizon execution.

You’ll join a team building autonomous AI agents that run full research cycles. Ingesting thousands of papers, forming hypotheses, running experiments, and producing traceable outputs used by real scientific teams.

The challenge is making that work in production.

You’ll own the systems behind it. APIs, data pipelines, and platform architecture designed for long-running workloads, large-scale ingestion, and iterative experimentation loops. This is full-stack in scope, but backend in depth, where system design decisions directly impact what the platform can do.

You’ll be working across:

  • Backend services in Python or Node, building scalable APIs (FastAPI/REST)
  • Data pipelines supporting agent execution and scientific workflows
  • Cloud infrastructure (AWS/GCP), containerisation (Docker, Kubernetes)
  • CI/CD, observability, and reliability for systems under continuous load

This isn’t a generalist full-stack role. You’ll need to understand how systems behave under heavy data and compute demands, and be comfortable making architectural trade-offs across distributed systems.

The team is small, high-calibre, and already running real workloads with revenue traction. Backed by $70M+, they’re building infrastructure that defines how AI is applied to scientific discovery.

 

Salary: $200,000–$350,000 + equity
Location: San Francisco (onsite)

Location: San Francisco, CA
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 30/03/2026
Job ID: 35569