Job Description
Want to build the speech and audio models that define how the next generation of voice AI actually sounds and listens?
A well-funded AI startup has developed new model architectures that make real-time conversational AI finally viable at scale. While most voice AI still suffers from delays and computational bottlenecks, they've solved the core efficiency problems that have held the field back.
The role
As their Senior Research Scientist, you'll build core speech foundation models that could define the next decade of voice interaction. You'll work on novel architectures that have immediate real-world impact for thousands of customers.
What you'll do
-
Design and implement SOTA speech foundation models
-
Develop efficient algorithms for speech processing and audio understanding
-
Create scalable systems that handle massive audio workloads
-
Build comprehensive evaluation methods to validate model performance
-
Collaborate with engineering teams to transition research into production
What you'll bring
-
Deep expertise in modern speech technologies (TTS, Speech LLMs, Voice Conversion/Cloning, Speech Translation, ASR, Audio Understanding)
-
Strong background in generative modelling for audio and speech
-
Publications at leading conferences
-
Track record of implementing research ideas from concept to production
You'll join a solid research team, including technical founders who've published work that's fundamentally shifted how the field thinks about efficient, large-scale foundation models. They're well-funded and generating strong revenue. Comp is on par with top AI labs, with base over $400k+ DOE plus a generous equity package.
The role is based in San Francisco, hybrid with 4 days a week in the office.
If you're excited about building the foundational models that will power the next generation of voice AI, we'd love to hear from you.
All applicants will receive a response.