Job title: Data Machine Learning Engineer
Job type: Permanent
Emp type: Full-time
Industry: Generative AI
Functional Expertise: Data Gen-Speech/TTS Speech-to-Speech
Salary type: Annual
Salary: negotiable
Location: Remote, worldwide
Job published: 04/05/2026
Job ID: 35965

Job Description

Want to own the data infrastructure behind some of the most naturalistic voice models in production?

You'll be joining a well-funded speech AI startup — just closed their Series A — with strong enterprise traction and revenue that more than doubled last quarter. They're building ultra-realistic voice technology that handles natural laughter, breathing, seamless language switching, and accurate pronunciation across languages and accents. Their models are powering hundreds of millions of conversations monthly.

Before training a single model, they built their own corpus — full-duplex, studio-quality conversational speech annotated by PhD linguists. As their MLE, you'll own the pipelines that turn that raw material into clean, training-ready data.

What you'll do

  • Own end-to-end data pipelines from raw audio ingestion through to versioned, training-ready datasets
  • Build quality systems that catch annotation errors and alignment issues before they reach a training run
  • Maintain the training infrastructure that keeps GPUs fed — dataloaders, streaming datasets, multi-modal batching
  • Build and iterate on tooling across speech representations including neural codecs, semantic tokens and mel features
  • Handle full- and half-duplex pipeline work including two-channel alignment and overlap handling

What you'll bring

  • Strong engineering fundamentals with experience building ML data pipelines at scale
  • Hands-on experience with speech or audio data
  • Solid understanding of speech representations and the tradeoffs between them
  • Experience with multi-channel audio data including diarisation and alignment

Nice to have

  • Experience with multilingual data pipelines
  • Large-scale training infrastructure experience — FSDP, DeepSpeed, Ray
  • Annotation tooling and human-in-the-loop systems

Remote-friendly. Competitive base plus stock.

Apply with indeed
File types (doc, docx, pdf, rtf, png, jpeg, jpg, bmp, jng, ppt, pptx, csv, gif) size up to 5MB