Your search has found 5 jobs

Ready to pioneer the speech intelligence behind the next generation of embodied AI?

Join a pioneering startup developing foundational technology for natural conversation in embodied agents. You'll advance the speech systems that power avatars with authentic behaviours, real-time expression, and conversational intelligence that handles interruptions and turn-taking just like humans.

This Lead Research Scientist role focuses on advancing real-time speech systems for interactive avatars. You'll develop full-duplex dialogue models and speech-to-speech architectures that enable natural conversational flow, interruption handling, and emotional expression.

Founded by ex-Googlers, they're building proprietary behaviour models that learn from two-way interactions, creating systems where speech timing, prosody, and contextual responses work in harmony with facial expressions and physical behaviours to drive authentic embodied intelligence.

Your focus:

  • Research & develop full-duplex speech systems with natural interruption handling
  • Develop expressive voice models with controllable prosody and timing
  • Build speech-to-speech architectures preserving identity and emotion
  • Create real-time audio generation systems for conversational avatars
  • Publish research while deploying systems in production
  • Collaborate across teams integrating speech with visual behaviour

Requirements:

  • PhD in Speech, Machine Learning, or related field
  • First-author publications at top conferences (Interspeech, ICASSP, NeurIPS, ICLR, etc)
  • Expertise in text-to-speech, speech-to-speech models, or voice cloning
  • Large-scale training experience
  • Experience in prosody modelling or real-time audio generation

Nice to have:

  • Experience with full-duplex speech research
  • Speech-visual alignment expertise (lip sync, expressions)
  • Real-time audio deployment optimisation

Package:

  • Competitive salary $200k- $300k base (based on experience)
  • Meaningful equity package
  • Comprehensive healthcare (90% covered)
  • Unlimited PTO
  • Fully remote work with regular team offsites
  • Life insurance and disability coverage

Location: Fully remote position, globally, with preference for Pacific Time alignment.

Ready to make AI conversations feel authentically human?

Contact Allys at Techire AI. All applicants will receive a response.

 
Location: Remote
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 02/07/2025
Job ID: 33482

Ready to architect the future of human-computer voice interaction?

Join an established conversational AI company as they transition from traditional cascaded speech systems to cutting-edge E2E speech-to-speech technology. You'll lead this transformation, building multimodal systems that will redefine how millions interact with AI.

The opportunity

You'll be developing technology that directly impacts real users at massive scale. The company processes millions of daily interactions across major enterprise clients, meaning your research will shape real-world conversational experiences.

You'll spearhead the development of proprietary full-duplex speech systems, creating truly natural AI conversations that go far beyond current capabilities.

Your impact

  • Design and build next-generation speech language models from the ground up
  • Drive breakthroughs in speech-to-speech modeling and full-duplex conversation systems
  • Tackle turn-taking, interruption handling, and simultaneous speech processing
  • Bridge cutting-edge research with enterprise-grade production systems
  • Lead a growing team focused on speech-to-speech breakthroughs

What you'll bring

  • Deep understanding of SOTA speech models and neural audio processing
  • Experience building speech language models/multimodal systems
  • Strong background in speech AI research and modern speech architectures

With their established market position and proven track record, you'll have the resources and real-world testing ground to make transformative impact with your research.

The company has built everything in-house, giving you complete technical control and the freedom to explore any approach that delivers value.

Location

Remote (Must be close to EU timezone)

Location: Remote
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 28/06/2025
Job ID: 33350

Ready to build speech AI that actually works in real-time?

A well-funded AI startup has developed new model architectures that make real-time conversational AI finally viable at scale. While most voice AI still suffers from delays and computational bottlenecks, they've solved the core efficiency problems that have held the field back. 

The role

As their Speech Research Scientist, you'll build the speech models that could define the next decade of voice interaction. You'll work on novel architectures that have immediate real-world impact for thousands of customers.

What you'll do

  • Design and implement SOTA speech synthesis models
  • Develop efficient algorithms for voice processing and audio understanding
  • Create scalable systems that handle massive audio workloads
  • Build comprehensive evaluation methods to validate model performance
  • Collaborate with engineering teams to transition research into production

What you'll bring

  • Deep expertise in modern speech technologies (Text-to-Speech, Speech LLMs, Voice Conversion/Cloning, Speech Synthesis, Speech Translation, Speech Restoration)
  • Strong background in generative modeling for audio and speech
  • Publications at leading conferences
  • Track record of implementing research ideas from concept to production
This role is based in the Bay Area.
 
If you're excited about building the foundational models that will power the Voice AI revolution, we'd love to hear from you.
 
Location: Bay Area
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 05/06/2025
Job ID: 33251

Do you want to create AI that converses as naturally as humans do? A pioneering healthtech unicorn is building AI digital health agents designed to safely and empathetically assist patients.

As the Staff Research Scientist, you'll play a key part in making this a reality - building end-to-end foundational speech models capable of full-duplex communication. This isn't just about taking turns speaking; it's about creating AI that can listen and respond simultaneously with human-like conversation, emotions, and natural language that healthcare demands.

What you'll do

  • Design and develop novel speech foundation models for healthcare conversations, working end-to-end from research through to productionizing models
  • Work on post-training LLMs for speech to enhance their conversational capabilities
  • Tackle unique challenges including response time optimization, maintaining alignment between text and speech outputs, and operating in noisy environments
  • Create innovative approaches to synthetic conversational data generation
  • Have the opportunity to publish your groundbreaking research

What you'll bring

  • PhD with 8+ years in speech technologies or related field
  • Experience with Speech LLMs
  • Experience training large datasets
  • Strong publication record at top-tier conferences in speech/multimodal AI
  • Ability to implement research papers from scratch

Bonus points for

  • Experience pre-training foundation models with speech (HuBERT, Wav2Vec, or similar)
  • Multimodal experience
  • Experience with inference technologies (vLLM, CUDA)

You'll be based in the Bay Area and will receive highly competitive comp (up to $350K base DOE) with substantial equity.

If you're excited about creating the next generation of speech AI that will revolutionize healthcare communication, click apply!

Location: Bay Area
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 16/04/2025
Job ID: 33086

The future of communication should be bias and barrier-free. That's the vision behind this well-funded start-up pioneering real-time speech algorithms.
 
You'll join a research team on tech that is the first of its kind, improving how we communicate in the real world. By offering clear, natural-sounding conversations regardless of accent or environment. Their ground-breaking technology is already providing impressive results, so it's no wonder they're growing x4 annually.
 
They are creating the whitespace in speech research, and you'll play a key role.
 
As an Senior ML Scientist, you'll work within a talented R&D team advancing core speech algorithms and audio AI models.
 
The role
 
- Contribute to cutting-edge R&D advancing core speech algorithms and Generative Audio models. Continually push boundaries to the next level.
- Tackle unsolved problems in Generative Speech and Audio such as preserving naturalness and performance in noisy environments.
- End-to-end ownership of models, from data collection to training on the cloud.
- Develop novel architectures balancing cutting-edge performance with real-time efficiency & low-latency
- Collaborate with top scientists in this field 
 
You'll have
 
- 4+ years of industry experience developing and implementing either of the following: TTS, Voice Conversion/Cloning, Speech Synthesis, Speech Translation, Accent Translation, Speech Restoration
- Proven background contributing to well-known research publications and/or products in these areas
- PhD or degree in Computer Science, ML, or related field. 
- Proven experience with PyTorch, TensorFlow and modern DL techniques such as GANs, VAEs, diffusion or flow models, etc.
- Familiar with cloud-based technologies and production environments
 
What you'll get in return
 
- Benefits include a competitive salary, share options, unlimited PTO health coverage, and a VPO plan.
- Contributing to whitespace in speech technology research, you'll have control over the direction of your work with no friction whatsoever. You're the expert after all.
 
If you're looking to make an impact, there are few better places to do it. Your work here has the power to improve communication, eliminate confusion, and create a more connected world.
 
If you want the freedom to shape the future of speech AI, apply now.

Location: Bay Area
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: USD $300,000.00
Job published: 26/02/2025
Job ID: 32723