Job title: ML Engineer, Speech Data
Job type: Permanent
Emp type: Full-time
Industry: Generative AI
Functional Expertise: ASR/Speech Rec Audio Gen-Speech/TTS Voice Conversion/Dubbing
Salary type: Annual
Salary: negotiable
Location: Remote
Job published: 27/03/2026
Job ID: 34412

Job Description

Ready to own the data pipeline powering the voice of the next generation of AI characters?

You'll be joining a well-funded startup building AI character technology, where speech is a core part of the product experience.

Think super natural conversations, handling interruptions, personality shifts and more!

You'll own the datasets that power their speech systems — from raw, messy audio through to clean, versioned training corpora that directly drive TTS and ASR model performance.

Your focus

  • Own the full data lifecycle — defining specs, auditing and curating large-scale audio and text corpora
  • Build automated quality metrics and dashboards across SNR, VAD, WER, speaker verification and safety, validated against listening tests
  • Train and deploy lightweight classifiers for noise detection, diarisation, language ID, and content moderation

What you'll bring

  • Deep experience working with speech and audio data at scale — 1M+ hours
  • Strong ML engineering skills in Python and PyTorch, including training and fine-tuning models like Whisper or Wav2Vec
  • Practical knowledge of audio processing — torchaudio, librosa, spectrograms, DSP basics
  • A solid understanding of audio quality metrics — MOS, WER, PESQ/STOI, SNR, speaker verification

Nice to have

  • Experience with Spark/Beam, Airflow, SQL or similar data engineering tools
  • Open-source contributions or publications in speech or audio ML
  • Background in denoising and enhancement, and how it affects downstream model quality

Remote, with a preference for European or overlapping timezones. Competitive compensation and equity.