Job Description
Want to work on one of the hardest unsolved problems in voice AI — making it actually sound like a human conversation?
Most voice AI falls apart the moment a conversation gets messy. Someone interrupts, emotions shift, the flow breaks — and the model can't keep up.
A small, ambitious SF startup is tackling exactly these problems, building speech models that handle natural conversation the way humans actually experience it. They have a working prototype and early commercial traction across several high-profile industry verticals.
The role
As a Senior Research Scientist, your focus is post-training — curating data, fine-tuning pre-trained speech models, and building the evaluation infrastructure that validates it all. You'll work on large-scale models with access to significant data resources.
What you'll do
-
Shape the data that goes into post-training — sourcing, cleaning and structuring it for large speech models
-
Supervised fine-tuning of pre-trained speech models
-
Build evaluation workflows — automated and human-in-the-loop
-
Drive measurable improvements in hallucination rates, instruction-following and generalisation
What you'll bring
-
PhD in ML or related field with a strong publications record
-
Hands-on experience training large speech models — ASR, TTS, or speech-to-speech
-
Solid post-training and SFT experience
The founding team includes a founding engineer from a billion-dollar AI company where they co-created one of the first generative models in the field, alongside the co-creator of the first generative voice at one of the world's largest tech companies.
Compensation is between $400k-$500k base with generous equity.
Based in San Francisco, onsite. Relocation support for those in US and willing to make the move.