Job Description
Ready to own the data pipeline powering the voice of the next generation of AI characters?
You'll be joining a well-funded startup building AI character technology, where speech is a core part of the product experience.
Think super natural conversations, handling interruptions, personality shifts and more!
You'll own the datasets that power their speech systems — from raw, messy audio through to clean, versioned training corpora that directly drive TTS and ASR model performance.
Your focus
- Own the full data lifecycle — defining specs, auditing and curating large-scale audio and text corpora
- Build automated quality metrics and dashboards across SNR, VAD, WER, speaker verification and safety, validated against listening tests
- Train and deploy lightweight classifiers for noise detection, diarisation, language ID, and content moderation
What you'll bring
- Deep experience working with speech and audio data at scale — 1M+ hours
- Strong ML engineering skills in Python and PyTorch, including training and fine-tuning models like Whisper or Wav2Vec
- Practical knowledge of audio processing — torchaudio, librosa, spectrograms, DSP basics
- A solid understanding of audio quality metrics — MOS, WER, PESQ/STOI, SNR, speaker verification
Nice to have
- Experience with Spark/Beam, Airflow, SQL or similar data engineering tools
- Open-source contributions or publications in speech or audio ML
- Background in denoising and enhancement, and how it affects downstream model quality
Remote, with a preference for European or overlapping timezones. Competitive compensation and equity.