Your search has found 3 jobs

Want to build speech AI that actually sounds human?

You'll be joining a well-funded speech AI startup with strong customer traction. They're building ultra-realistic voice technology that handles natural laughter, breathing, seamless language switching, and accurate pronunciation across languages and accents.

As a Staff Research Engineer, you'll work hands-on to expand their foundation models and push the boundaries of what's possible in speech AI: exploring multilingual capabilities, long-context generation, full-duplex modeling for natural conversations with interruptions, and novel architectures that balance speed with control.

What you'll do

  • Conduct research to advance their core speech models and extend product capabilities
  • Develop and experiment with new model architectures and training approaches
  • Work on large-scale model training and data systems
  • Collaborate with the team to take research from concept to deployed systems

What you'll bring

  • 3+ years of experience in speech synthesis, audio generation, or generative modeling
  • Experience with audio generation using LLMs
  • Solid background in modern language model architectures
  • Proven ability to ship research into production systems
  • Experience training large-scale models

Nice to have

  • Published research in speech or generative modeling
  • Experience with real-time speech systems or multimodal models

Ideally in SF, but can also consider remote worldwide. Comp is up to $250K base DOE, plus equity.

Location: San Francisco, CA
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 23/12/2025
Job ID: 34579

Looking to tackle novel speech challenges at scale?

You'll be joining a small but mighty speech AI company building proprietary speech tech from the ground up. With a strong customer base, your research will directly impact production systems serving enterprise customers, with the opportunity to see your work deployed at scale in real-world voice applications.

They're a well-funded startup with healthy revenue streams and immediate opportunities for high-impact research.

Your research

You'll be working on breakthrough speech research that push the boundaries of naturalness and real-time performance. The company has achieved ultra-low latency and is now advancing toward unified speech-to-speech architectures.

You'll develop emotional expression and natural speech generation, advance multilingual support across 30+ languages, and enhance voice cloning robustness.

Your focus

  • Lead cutting-edge research in SOTA speech models (TTS, ASR, or speech-to-speech)
  • Design, execute and iterate on experiments end-to-end
  • Drive speech controllability and naturalness improvements
  • Develop evaluation methodologies for speech quality assessment

What you'll bring

  • Deep understanding of cutting-edge speech models with end-to-end pipeline experience
  • Experience with large-scale model training
  • Strong background in speech model development and optimisation
  • Published work with demonstrable results in industry or academic settings

Nice to have

  • Performance optimisation experience for latency and compute efficiency
  • Experience with model fusion and unified architectures

This is a remote role, either in US or Europe. Competitive comp based on experience.

Location: Remote
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 23/09/2025
Job ID: 33913

Do you want to create emotionally expressive AI that transforms healthcare conversations?

A pioneering healthtech unicorn is building AI digital health agents designed to safely and empathetically assist patients. Their immediate focus is developing conversational AI with genuine emotional intelligence, with longer-term vision for full-duplex communication capabilities.

As the Staff Research Scientist, you'll play a key part in making this a reality - building foundational speech models that understand and respond with human-like emotion and natural conversation that healthcare demands.

What you'll do

  • Design and develop emotionally expressive speech models for healthcare conversations, working end-to-end from research through to productionizing models
  • Build conversational AI systems that can interpret and respond with appropriate emotional intelligence
  • Work on post-training techniques to enhance speech models' conversational and emotional capabilities
  • Tackle unique challenges including response time optimization, maintaining emotional consistency, and operating in noisy healthcare environments
  • Have the opportunity to publish your groundbreaking research

What you'll bring

  • 5+ years in speech technologies or related field
  • Hands-on experience with speech-to-speech systems (highly preferred), or strong experience in Text-to-Speech, Speech LLMs, emotional/expressive speech synthesis, or similar
  • Experience training large speech datasets
  • Ability to implement research papers from scratch

Bonus points for

  • Experience pre-training foundation models with speech (HuBERT, Wav2Vec, or similar)
  • Multimodal experience
  • Experience with inference technologies (vLLM, CUDA)

You'll be based in the Bay Area or willing to relocate. You'll receive highly competitive comp (up to $350K base DOE) with substantial equity.

If you're excited about creating the next generation of emotionally intelligent speech AI that will revolutionise healthcare communication, click apply!

Location: Bay Area
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 16/04/2025
Job ID: 33086