techire ai Engagement Hub

Want to work on one of the hardest unsolved problems in voice AI — making it actually sound like a human conversation?

Most voice AI falls apart the moment a conversation gets messy. Someone interrupts, emotions shift, the flow breaks — and the model can't keep up.

A small, ambitious SF startup is tackling exactly these problems, building speech models that handle natural conversation the way humans actually experience it. They have a working prototype and early commercial traction across several high-profile industry verticals.

The role

As a Senior Research Scientist, your focus is post-training — curating data, fine-tuning pre-trained speech models, and building the evaluation infrastructure that validates it all. You'll work on large-scale models with access to significant data resources.

What you'll do

Shape the data that goes into post-training — sourcing, cleaning and structuring it for large speech models
Supervised fine-tuning of pre-trained speech models
Build evaluation workflows — automated and human-in-the-loop
Drive measurable improvements in hallucination rates, instruction-following and generalisation

What you'll bring

PhD in ML or related field with a strong publications record
Hands-on experience training large speech models — ASR, TTS, or speech-to-speech
Solid post-training and SFT experience

The founding team includes a founding engineer from a billion-dollar AI company where they co-created one of the first generative models in the field, alongside the co-creator of the first generative voice at one of the world's largest tech companies.

Compensation is between $400k-$500k base with generous equity.

Based in San Francisco, onsite. Relocation support for those in US and willing to make the move.

Location:	San Francisco
Job type:	Permanent
Emp type:	Full-time
Salary type:	Annual
Salary:	negotiable
Job published:	30/04/2026
Job ID:	34047

ML Engineer, Speech Data

Ready to own the data pipeline powering the voice of the next generation of AI characters?

You'll be joining a well-funded startup building AI character technology, where speech is a core part of the product experience.

Think super natural conversations, handling interruptions, personality shifts and more!

You'll own the datasets that power their speech systems — from raw, messy audio through to clean, versioned training corpora that directly drive TTS and ASR model performance.

Your focus

Own the full data lifecycle — defining specs, auditing and curating large-scale audio and text corpora
Build automated quality metrics and dashboards across SNR, VAD, WER, speaker verification and safety, validated against listening tests
Train and deploy lightweight classifiers for noise detection, diarisation, language ID, and content moderation

What you'll bring

Deep experience working with speech and audio data at scale — 1M+ hours
Strong ML engineering skills in Python and PyTorch, including training and fine-tuning models like Whisper or Wav2Vec
Practical knowledge of audio processing — torchaudio, librosa, spectrograms, DSP basics
A solid understanding of audio quality metrics — MOS, WER, PESQ/STOI, SNR, speaker verification

Nice to have

Experience with Spark/Beam, Airflow, SQL or similar data engineering tools
Open-source contributions or publications in speech or audio ML
Background in denoising and enhancement, and how it affects downstream model quality

Remote, with a preference for European or overlapping timezones. Competitive compensation and equity.

Location:	Remote
Job type:	Permanent
Emp type:	Full-time
Salary type:	Annual
Salary:	negotiable
Job published:	27/03/2026
Job ID:	34412

Your search query

Our use of cookies

Your search query

Send me similar jobs

Our use of cookies