Your search has found 2 jobs

Ready to own the data pipeline powering the voice of the next generation of AI characters?

You'll be joining a well-funded startup building AI character technology, where speech is a core part of the product experience.

Think super natural conversations, handling interruptions, personality shifts and more!

You'll own the datasets that power their speech systems — from raw, messy audio through to clean, versioned training corpora that directly drive TTS and ASR model performance.

Your focus

  • Own the full data lifecycle — defining specs, auditing and curating large-scale audio and text corpora
  • Build automated quality metrics and dashboards across SNR, VAD, WER, speaker verification and safety, validated against listening tests
  • Train and deploy lightweight classifiers for noise detection, diarisation, language ID, and content moderation

What you'll bring

  • Deep experience working with speech and audio data at scale — 1M+ hours
  • Strong ML engineering skills in Python and PyTorch, including training and fine-tuning models like Whisper or Wav2Vec
  • Practical knowledge of audio processing — torchaudio, librosa, spectrograms, DSP basics
  • A solid understanding of audio quality metrics — MOS, WER, PESQ/STOI, SNR, speaker verification

Nice to have

  • Experience with Spark/Beam, Airflow, SQL or similar data engineering tools
  • Open-source contributions or publications in speech or audio ML
  • Background in denoising and enhancement, and how it affects downstream model quality

Remote, with a preference for European or overlapping timezones. Competitive compensation and equity.

Location: Remote
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 27/03/2026
Job ID: 34412

Want to build speech AI that actually sounds human?

You'll be joining a well-funded speech AI startup with strong customer traction. They're building ultra-realistic voice technology that handles natural laughter, breathing, seamless language switching, and accurate pronunciation across languages and accents.

As a Staff Research Engineer, you'll work hands-on to expand their foundation models and push the boundaries of what's possible in speech AI: exploring multilingual capabilities, long-context generation, full-duplex modeling for natural conversations with interruptions, and novel architectures that balance speed with control.

What you'll do

  • Conduct research to advance their core speech models and extend product capabilities
  • Develop and experiment with new model architectures and training approaches
  • Work on large-scale model training and data systems
  • Collaborate with the team to take research from concept to deployed systems

What you'll bring

  • 3+ years of experience in speech synthesis, audio generation, or generative modeling
  • Experience with audio generation using LLMs
  • Solid background in modern language model architectures
  • Proven ability to ship research into production systems
  • Experience training large-scale models

Nice to have

  • Published research in speech or generative modeling
  • Experience with real-time speech systems or multimodal models

Ideally in SF, but can also consider remote worldwide. Comp is up to $250K base DOE, plus equity.

Location: San Francisco, CA
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 23/12/2025
Job ID: 34579