techire ai Engagement Hub

Want to own the data infrastructure behind some of the most naturalistic voice models in production?

You'll be joining a well-funded speech AI startup — just closed their Series A — with strong enterprise traction and revenue that more than doubled last quarter. They're building ultra-realistic voice technology that handles natural laughter, breathing, seamless language switching, and accurate pronunciation across languages and accents. Their models are powering hundreds of millions of conversations monthly.

Before training a single model, they built their own corpus — full-duplex, studio-quality conversational speech annotated by PhD linguists. As their MLE, you'll own the pipelines that turn that raw material into clean, training-ready data.

What you'll do

Own end-to-end data pipelines from raw audio ingestion through to versioned, training-ready datasets
Build quality systems that catch annotation errors and alignment issues before they reach a training run
Maintain the training infrastructure that keeps GPUs fed — dataloaders, streaming datasets, multi-modal batching
Build and iterate on tooling across speech representations including neural codecs, semantic tokens and mel features
Handle full- and half-duplex pipeline work including two-channel alignment and overlap handling

What you'll bring

Strong engineering fundamentals with experience building ML data pipelines at scale
Hands-on experience with speech or audio data
Solid understanding of speech representations and the tradeoffs between them
Experience with multi-channel audio data including diarisation and alignment

Nice to have

Experience with multilingual data pipelines
Large-scale training infrastructure experience — FSDP, DeepSpeed, Ray
Annotation tooling and human-in-the-loop systems

Remote-friendly. Competitive base plus stock.

Location:	Remote, worldwide
Job type:	Permanent
Emp type:	Full-time
Salary type:	Annual
Salary:	negotiable
Job published:	04/05/2026
Job ID:	35965

Lead Research Scientist

Ready to architect the future of human-computer voice interaction?

Join an established conversational AI company as they transition from traditional cascaded speech systems to cutting-edge E2E speech-to-speech technology. You'll lead this transformation, building multimodal systems that will redefine how millions interact with AI.

The opportunity

You'll be leading the development of speech technology that directly impacts real users at massive scale. The company processes millions of daily interactions across major enterprise clients, meaning your research will shape real-world conversational experiences.

You'll spearhead the development of full-duplex speech systems, creating truly natural AI conversations that go far beyond current capabilities.

Your impact

Design and build next-generation multimodal speech LLM architecture from the ground up
Drive breakthroughs in speech-to-speech modeling and full-duplex conversation systems
Tackle turn-taking, interruption handling, and simultaneous speech processing
Bridge cutting-edge research with enterprise-grade production systems
Lead a growing team focused on SOTA speech-to-speech breakthroughs and own the development end-to-end

What you'll bring

Deep understanding of SOTA speech models and neural audio processing
Experience building speech language models/multimodal systems
Strong background in speech AI research and modern speech architectures

This is all underpinned by access to a large corpus of real enterprise conversational data and serious GPU infrastructure.

The company has built everything in-house, giving you complete technical control and the freedom to explore any approach that delivers value.

With their established market position and proven track record, you'll have the resources and real-world testing ground to make transformative impact with your research.

Location

Remote (Must be within EU timezone).

Location:	Remote
Job type:	Permanent
Emp type:	Full-time
Salary type:	Annual
Salary:	negotiable
Job published:	30/04/2026
Job ID:	33350

Senior Speech Scientist

Want to build speech AI that actually sounds human?

You'll be joining a well-funded speech AI startup with strong customer traction. They're building ultra-realistic voice technology that handles natural laughter, breathing, seamless language switching, and accurate pronunciation across languages and accents.

As their Senior Research Scientist, you'll work hands-on to expand their foundation models and push the boundaries of what's possible in speech AI: exploring multilingual capabilities, long-context generation, full-duplex modeling for natural conversations with interruptions, and novel architectures that balance speed with control.

What you'll do

Conduct research to advance their core speech models and extend product capabilities
Develop and experiment with new model architectures and training approaches
Work on large-scale model training and data systems
Collaborate with the team to take research from concept to deployed systems

What you'll bring

3+ years of experience in speech synthesis, audio generation, or generative modeling
Experience with audio generation using LLMs
Solid background in modern language model architectures
Proven ability to ship research into production systems
Experience training large-scale models

Nice to have

Published research in speech or generative modeling
Experience with real-time speech systems or multimodal models

Ideally in SF, but can also consider remote worldwide. Comp is up to $250K base DOE, plus equity.

Location:	San Francisco, CA
Job type:	Permanent
Emp type:	Full-time
Salary type:	Annual
Salary:	negotiable
Job published:	23/12/2025
Job ID:	34579

Staff Research Scientist

Build speech AI at trillion-parameter scale, driving human to AI conversation

This team has built their entire speech stack in-house, including proprietary LLM-based ASR and TTS, both already outperforming SOTA benchmarks. It's already powering real-time, human-to-AI conversations at scale, every day.

Now they're working at trillion-parameter scale to push toward a genuine end-to-end speech-to-speech LLM, one that understands and responds with genuine emotional intelligence and natural, human-like conversation. That means solving problems like long-context reasoning, pronunciation accuracy and maintaining consistency in noisy, real-world environments, in a domain where no model has cracked this yet.

They're hiring Senior and Staff-level Speech Scientists (Principal-level also considered) to help drive that work.

What you'll do:

Build SOTA speech models from the ground up, at genuinely large scale
Own problems end-to-end, from research through to production
Solve hard, domain-specific speech challenges as part of the push toward speech-to-speech LLM research
Help shape technical direction as part of a team working at the frontier of this space

What you'll bring:

Deep, hands-on expertise in at least one of: Speech/Audio LLMs, TTS or audio generation, or large-scale speech understanding
Experience shipping speech systems at real scale, not just research-stage prototypes

Nice to have:

Experience pre-training speech foundation models (HuBERT, Wav2Vec or similar)
Multimodal experience

Package: Up to $350K–$400K base (DOE), plus substantial equity. On-site, South Bay or Seattle.

Location:	Bay Area
Job type:	Permanent
Emp type:	Full-time
Salary type:	Annual
Salary:	negotiable
Job published:	16/04/2025
Job ID:	33086

Your search query

Our use of cookies

Your search query

Send me similar jobs

Our use of cookies