Job title:	Lead Research Scientist
Job type:	Permanent
Emp type:	Full-time
Industry:	Generative AI
Functional Expertise:	AI Avatars Foundation Models Gen-Speech/TTS Multimodal AI Speech-to-Speech Voice Cloning
Salary type:	Annual
Salary:	negotiable
Location:	Remote
Job published:	02/07/2025
Job ID:	33482

Job Description

Ready to pioneer the speech intelligence behind the next generation of embodied AI?

Join a pioneering startup developing foundational technology for natural conversation in embodied agents. You'll advance the speech systems that power avatars with authentic behaviours, real-time expression, and conversational intelligence that handles interruptions and turn-taking just like humans.

This Lead Research Scientist role focuses on advancing real-time speech systems for interactive avatars. You'll develop full-duplex dialogue models and speech-to-speech architectures that enable natural conversational flow, interruption handling, and emotional expression.

Founded by ex-Googlers, they're building proprietary behaviour models that learn from two-way interactions, creating systems where speech timing, prosody, and contextual responses work in harmony with facial expressions and physical behaviours to drive authentic embodied intelligence.

Your focus:

Research & develop full-duplex speech systems with natural interruption handling
Develop expressive voice models with controllable prosody and timing
Build speech-to-speech architectures preserving identity and emotion
Create real-time audio generation systems for conversational avatars
Publish research while deploying systems in production
Collaborate across teams integrating speech with visual behaviour

Requirements:

PhD in Speech, Machine Learning, or related field
First-author publications at top conferences (Interspeech, ICASSP, NeurIPS, ICLR, etc)
Expertise in text-to-speech, speech-to-speech models, or voice cloning
Large-scale training experience
Experience in prosody modelling or real-time audio generation

Nice to have:

Experience with full-duplex speech research
Speech-visual alignment expertise (lip sync, expressions)
Real-time audio deployment optimisation

Package:

Competitive salary $200k- $300k base (based on experience)
Meaningful equity package
Comprehensive healthcare (90% covered)
Unlimited PTO
Fully remote work with regular team offsites
Life insurance and disability coverage

Location: Fully remote position, globally, with preference for Pacific Time alignment.

Ready to make AI conversations feel authentically human?

Contact Allys at Techire AI. All applicants will receive a response.

Location:	San Francisco, CA
Job type:	Permanent
Emp type:	Full-time
Salary type:	Annual
Salary:	negotiable
Job published:	01/11/2025
Job ID:	34146

Location:	Bay Area
Job type:	Permanent
Emp type:	Full-time
Salary type:	Annual
Salary:	negotiable
Job published:	05/06/2025
Job ID:	33251

Job Description

Our use of cookies