Your search has found 7 jobs

Applied Scientist – Vision Language Models (Multimodal Reasoning)

Ready to build VLMs that go beyond captioning and simple grounding?

This role is centred on advancing vision-language models that power intelligent agents operating in complex, real-world environments. The focus is firmly on multimodal model design, training, and post-training, with a mix of computer vision.

As an Applied Scientist, you’ll work on large multimodal models that integrate visual inputs with language-based reasoning. You’ll explore how VLMs can move from recognition and description toward structured understanding, task execution, and agentic decision-making.

Your work will include designing model architectures, improving cross-modal alignment, and developing post-training strategies that strengthen reasoning, factual consistency, and controllability. You’ll contribute across the full lifecycle, from data curation and supervised fine-tuning through to preference optimisation and evaluation.

This is a research-heavy role with clear production impact. You’ll prototype new ideas, run rigorous experiments, and collaborate with engineering teams to deploy models into live agent workflows.

Your focus will include:

  • Training and fine-tuning large-scale vision-language models
  • Improving multimodal alignment between image and text representations
  • Applying post-training techniques such as SFT, RLHF, DPO, and reward modelling
  • Designing evaluation frameworks for reasoning quality, grounding accuracy, and robustness
  • Working with large multimodal datasets, including synthetic and proprietary data

Hands-on work with VLMs or multimodal foundation models is essential. Experience in post-training, alignment, or preference learning is highly valued.

A solid understanding of how to evaluate multimodal systems, including hallucination, grounding failures, and reasoning gaps, is important. You should be comfortable reading and implementing recent research, and designing experiments that move models forward in measurable ways.

You’ll have ownership over modelling decisions and the opportunity to influence how multimodal intelligence is shaped within a fast-growing AI team.

Compensation: $200,000 - $320,000 base (negotiable depending on level) + bonus + meaningful equity + benefits

Location: SF Bay Area or Miami (Hybrid). Remote flexibility in the short term.

If you’re motivated by pushing vision-language models toward deeper reasoning and real-world capability, we’d like to speak with you!

All applicants will receive a response.

Location: United States
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 18/05/2026
Job ID: 33847

Want to own how an AI product actually thinks at scale?

You’ll join a team building one of the largest conversational AI platforms globally, already used by 50M+ people and growing fast. This isn’t an API wrapper or a thin product layer. AI is the product.

You’ll take ownership of the core model behaviour, shaping how the system responds, adapts, and improves across millions of real conversations. That means working where model design meets product reality, where latency, cost, safety, and user experience all collide.

You’ll lead from the front. Still hands-on, still in the code, but responsible for the direction.

The work sits across post-training, inference, and system design. You’ll be making decisions that directly affect how users experience the product every day.

Your focus will include:

  • Owning LLM behaviour across a high-scale conversational system
  • Fine-tuning and adapting open-source models such as Llama, Mistral, and Qwen
  • Improving response quality, alignment, and conversational memory
  • Designing evaluation pipelines that reflect real user interactions, not just offline benchmarks
  • Optimising inference for latency, cost, and reliability at scale

You’ll also lead a small team, setting direction while staying close to implementation. This is not a step away from the work.

There’s real technical ownership here. You’ll define trade-offs across:

  • RAG versus fine-tuning approaches
  • Model selection and architecture decisions
  • Scaling strategies across compute, latency, and cost

You’ll likely have experience building and deploying LLM systems in production, not just experimenting. You understand how models behave in messy, real-world environments and how to improve them iteratively.

Background-wise, you might come from conversational AI, assistants, or agent-based systems. You’ve probably worked with post-training methods like LoRA, QLoRA, SFT, RLHF, or DPO, and you’re comfortable with modern tooling across PyTorch, Hugging Face, and inference stacks.

Why this role?

You’ll be working on a product with real usage at global scale. The feedback loop is immediate. Changes you make will impact millions of interactions.

The team moves quickly. Ideas are tested and shipped in days, not quarters. There’s minimal process overhead and a strong bias toward building.

You’ll also be operating in a product space that brings real complexity, including content moderation and safety challenges. It’s not a clean lab environment, it’s production AI with all the edge cases that come with it.

Package

Salary: ~$200,000 base + ~$80,000 equity
Location: Fully remote (global)
Type: Full-time (B2B or employment)

If you’re looking to own LLM systems at scale, technically and directionally, this is worth exploring.

All applicants will receive a response.

Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: USD $200,000.00
Job published: 30/04/2026
Job ID: 35800

Want to build the simulated worlds that test what frontier models are really capable of?

This is a chance to join a team advancing the science of post-training and scalable evaluation — building reinforcement learning environments that push reasoning, planning, and long-horizon behaviour to their limits.

Instead of static benchmarks, you’ll create dynamic simulations that measure real intelligence — not just accuracy. You’ll design new post-training algorithms (RLHF, DPO, GRPO and beyond), develop richer reward models that move past exact-match scoring, and build evaluation frameworks that define how next-generation AI is trained, aligned, and understood.

The work combines deep research with hands-on implementation — from writing papers to seeing your methods deployed in live systems. It’s ideal for researchers who care about bridging academic insight and practical impact, helping AI progress beyond metrics that no longer tell the whole story.

You’ll bring:

  • Research experience in post-training, reinforcement learning, or evaluation for LLMs.

  • Strong understanding of transformer models and experimental design.

  • Publication record at leading venues (NeurIPS, ICLR, ICML, ACL, EMNLP).

  • PhD or equivalent research experience in CS, ML, NLP, or RL.

Package: Up to $300K base (DOE) + meaningful equity + comprehensive benefits (401k, unlimited PTO, relocation and sponsorship available).
Location: On-site or hybrid San Francisco.

If you want to shape how AI is trained, tested, and trusted — this is the place to do it.
All applicants will receive a response.

Location: San Francisco
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 27/04/2026
Job ID: 33814

Teach AI how to reason — safely, transparently, and at scale.

How do we move beyond pattern-matching into true machine reasoning? This Applied Scientist role puts you at the centre of that challenge — developing models that can reason, explain their logic, and make verifiable decisions across complex, high-stakes industries.

You’ll join a well-funded startup building domain-specific reasoning systems and agentic AI for sectors like medtech, aerospace, advanced manufacturing  — where reliability and interpretability aren’t optional.

Your work will focus on post-training large multimodal models, applying the latest techniques in RLHF, DPO, and preference learning to make AI systems more consistent, factual, and aligned with human reasoning. You’ll design the frameworks that turn raw model potential into transparent, trustworthy intelligence.

You’ll develop and optimise post-training pipelines, implement reward modelling for reasoning depth and factual accuracy, and build evaluation frameworks for verifiable, human-aligned behaviour. Working with proprietary and synthetic datasets, you’ll run end-to-end experiments and deploy your methods directly into production.

You’ll bring a background in transformer-based model training (LLM, VLM, MLLM), post-training or alignment (RLHF, DPO, reward modelling), and strong practical skills in Python and PyTorch. Curiosity about reasoning agents, hybrid learning, and interpretability research will help you thrive here.

Bonus points for experience in multimodal reasoning, evaluation and verification, or prior research contributions in alignment or reasoning systems.

The company has raised $20M+ (Series A announcement imminent) and already partners with Fortune 100 and 500 customers. Founded by an entrepreneur with a prior billion-dollar exit, the AI team alone is scaling from 11 to 40+ this year.

Comp: $200K–$320K base (negotiable depending on experience) + bonus + stock + benefits
Location: SF Bay Area (remote for now; hybrid later in 2026)

If you’re excited about defining how AI systems reason, decide, and explain themselves — we’d love to hear from you.

All applicants receive a response.

Location: San Francisco Bay Area
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 26/01/2026
Job ID: 34909

Build the 3D perception that gives AI agents real spatial intelligence.

How do AI systems truly see and reason about 3D geometry? This Applied Scientist role puts you at the centre of that challenge — developing models that bridge the physical world and intelligent reasoning systems.

You’ll join a well-funded startup building AI agents for advanced design and engineering workflows — across manufacturing, aerospace, and medtech. Your work will enable agents to understand CAD data, meshes, and point clouds deeply enough to plan, analyse, and make autonomous decisions.

This is a rare opportunity to establish the 3D foundation within the research team. You’ll define evaluation strategies, model objectives, and technical direction — building models that become the perception backbone for intelligent agents.

What you’ll do:
• Develop models that learn transferable 3D representations across CAD, mesh, and point cloud data
• Handle messy, lossy, real-world data — not just clean synthetic geometry
• Scale training across segmentation, classification, correspondence, and eventually generation
• Design robust evaluation pipelines for continuous performance monitoring
• Work toward a unified 3D foundation model supporting both discriminative and generative tasks

You’ll bring:
• Deep expertise in 3D computer vision (PhD or equivalent experience)
• Strong knowledge of modern 3D architectures (PointNet++, MeshCNN, Gaussian Splatting, Diffusion, VLMs)
• Proven ability training large-scale models in PyTorch
• Strong applied research instincts — turning papers into working systems
• Experience with multimodal or vision-language models

Bonus points:
• Background with CAD data or industrial design workflows
• Experience in robotics, autonomous driving, or AR/VR 3D perception
• Familiarity with SLAM, pose estimation, or differentiable rendering

You’ll join a small, research-driven team with full autonomy and major compute access — free to explore foundational methods while delivering practical impact.

Compensation & location:
• Base salary: $200K–$300K (negotiable by level)
• Up to 20% bonus + stock
• Full medical, dental, and vision coverage
• 401k (3% match) and 20+ vacation days

Based in the SF Bay Area (currently remote, moving hybrid soon).
Applicants must hold valid US work authorisation (US Citizen or Green Card).

If you’re excited about building the 3D understanding that will power the next generation of intelligent agents — we’d love to hear from you.
All applicants will receive a response.

Location: United States
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 07/01/2026
Job ID: 33515

Want to build the large-scale RL environments frontier labs use to train agents that can truly reason and act?

This team are creating complex reinforcement learning environments — simulations where advanced agents learn to plan, adapt, and solve multi-step problems that stretch beyond standard benchmarks. The focus isn’t on training the models themselves, but on building the worlds that make meaningful learning and evaluation possible — the foundation for more capable, aligned systems.

You’ll work end-to-end across environment design, reward dynamics, and scalable simulation — developing the feedback loops that define what “good” looks like for intelligent behaviour. It’s open-ended, research-driven work where the task definition, data, and reward structure are often the hardest and most important problems to solve.

You’ll collaborate closely with researchers tackling unsolved challenges in reinforcement learning and agent behaviour, shaping experiments, scaling infrastructure, and refining how agents learn in the loop.

It suits someone with strong ML and RL experience, deep intuition for agent dynamics, and the curiosity to explore problems that don’t come with clear instructions.

On-site in San Francisco. Compensation up to $300K base (negotiable, depending on experience) plus equity.

If you want to help build the environments that teach the next generation of AI systems how to think, act, and adapt — we’d love to hear from you.

All applicants will receive a response.

Location: San Francisco, CA
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 06/01/2026
Job ID: 34645

Are you the kind of engineer who enjoys building complex systems that help models learn, not by training them directly, but by shaping the worlds they inhabit?

This team builds large-scale environments and benchmarks that frontier AI labs use to test and steer their models. Their goal is to make reinforcement learning measurable, creating rich, hyperrealistic simulations where agents can reason, act, and be safely evaluated.

You’ll work at the intersection of software engineering, reinforcement learning, and experimental research, designing the frameworks and pipelines that let agentic AI systems act, learn, and improve through interaction, not static data.

You'll Bring

  • Strong Python and software fundamentals who enjoy building ML infrastructure.

  • Experience in reinforcement learning, rewards, environment dynamics, evaluation loops.

  • Worked with browser/API simulations (Playwright, Selenium) or distributed compute.

  • Experience with open-ended problem spaces and a desire to shape the tools driving safe AGI progress.

It’s a technically deep team of ML engineers and researchers from leading labs and tech companies, developing the simulation and evaluation backbone for next-generation agents.

Compensation: $200,000–$250,000 base + equity
Location: San Francisco (on-site, relocation supported)

All applicants will receive a response.

Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 18/11/2025
Job ID: 34513