techire ai Engagement Hub

Want to own the data infrastructure behind some of the most naturalistic voice models in production?

You'll be joining a well-funded speech AI startup — just closed their Series A — with strong enterprise traction and revenue that more than doubled last quarter. They're building ultra-realistic voice technology that handles natural laughter, breathing, seamless language switching, and accurate pronunciation across languages and accents. Their models are powering hundreds of millions of conversations monthly.

Before training a single model, they built their own corpus — full-duplex, studio-quality conversational speech annotated by PhD linguists. As their MLE, you'll own the pipelines that turn that raw material into clean, training-ready data.

What you'll do

Own end-to-end data pipelines from raw audio ingestion through to versioned, training-ready datasets
Build quality systems that catch annotation errors and alignment issues before they reach a training run
Maintain the training infrastructure that keeps GPUs fed — dataloaders, streaming datasets, multi-modal batching
Build and iterate on tooling across speech representations including neural codecs, semantic tokens and mel features
Handle full- and half-duplex pipeline work including two-channel alignment and overlap handling

What you'll bring

Strong engineering fundamentals with experience building ML data pipelines at scale
Hands-on experience with speech or audio data
Solid understanding of speech representations and the tradeoffs between them
Experience with multi-channel audio data including diarisation and alignment

Nice to have

Experience with multilingual data pipelines
Large-scale training infrastructure experience — FSDP, DeepSpeed, Ray
Annotation tooling and human-in-the-loop systems

Remote-friendly. Competitive base plus stock.

Location:	Remote, worldwide
Job type:	Permanent
Emp type:	Full-time
Salary type:	Annual
Salary:	negotiable
Job published:	04/05/2026
Job ID:	35965

Senior Research Scientist

Want to work on one of the hardest unsolved problems in voice AI — making it actually sound like a human conversation?

Most voice AI falls apart the moment a conversation gets messy. Someone interrupts, emotions shift, the flow breaks — and the model can't keep up.

A small, ambitious SF startup is tackling exactly these problems, building speech models that handle natural conversation the way humans actually experience it. They have a working prototype and early commercial traction across several high-profile industry verticals.

The role

As a Senior Research Scientist, your focus is post-training — curating data, fine-tuning pre-trained speech models, and building the evaluation infrastructure that validates it all. You'll work on large-scale models with access to significant data resources.

What you'll do

Shape the data that goes into post-training — sourcing, cleaning and structuring it for large speech models
Supervised fine-tuning of pre-trained speech models
Build evaluation workflows — automated and human-in-the-loop
Drive measurable improvements in hallucination rates, instruction-following and generalisation

What you'll bring

PhD in ML or related field with a strong publications record
Hands-on experience training large speech models — ASR, TTS, or speech-to-speech
Solid post-training and SFT experience

The founding team includes a founding engineer from a billion-dollar AI company where they co-created one of the first generative models in the field, alongside the co-creator of the first generative voice at one of the world's largest tech companies.

Compensation is between $400k-$500k base with generous equity.

Based in San Francisco, onsite. Relocation support for those in US and willing to make the move.

Location:	San Francisco
Job type:	Permanent
Emp type:	Full-time
Salary type:	Annual
Salary:	negotiable
Job published:	30/04/2026
Job ID:	34047

Lead Research Scientist

Ready to architect the future of human-computer voice interaction?

Join an established conversational AI company as they transition from traditional cascaded speech systems to cutting-edge E2E speech-to-speech technology. You'll lead this transformation, building multimodal systems that will redefine how millions interact with AI.

The opportunity

You'll be leading the development of speech technology that directly impacts real users at massive scale. The company processes millions of daily interactions across major enterprise clients, meaning your research will shape real-world conversational experiences.

You'll spearhead the development of full-duplex speech systems, creating truly natural AI conversations that go far beyond current capabilities.

Your impact

Design and build next-generation multimodal speech LLM architecture from the ground up
Drive breakthroughs in speech-to-speech modeling and full-duplex conversation systems
Tackle turn-taking, interruption handling, and simultaneous speech processing
Bridge cutting-edge research with enterprise-grade production systems
Lead a growing team focused on SOTA speech-to-speech breakthroughs and own the development end-to-end

What you'll bring

Deep understanding of SOTA speech models and neural audio processing
Experience building speech language models/multimodal systems
Strong background in speech AI research and modern speech architectures

This is all underpinned by access to a large corpus of real enterprise conversational data and serious GPU infrastructure.

The company has built everything in-house, giving you complete technical control and the freedom to explore any approach that delivers value.

With their established market position and proven track record, you'll have the resources and real-world testing ground to make transformative impact with your research.

Location

Remote (Must be within EU timezone).

Location:	Remote
Job type:	Permanent
Emp type:	Full-time
Salary type:	Annual
Salary:	negotiable
Job published:	30/04/2026
Job ID:	33350

Speech Research Scientist

Want to build the speech and audio models that define how the next generation of voice AI actually sounds and listens?

A well-funded AI startup has developed new model architectures that make real-time conversational AI finally viable at scale. While most voice AI still suffers from delays and computational bottlenecks, they've solved the core efficiency problems that have held the field back.

The role

As their Senior Research Scientist, you'll build core speech foundation models that could define the next decade of voice interaction. You'll work on novel architectures that have immediate real-world impact for thousands of customers.

What you'll do

Design and implement SOTA speech foundation models
Develop efficient algorithms for speech processing and audio understanding
Create scalable systems that handle massive audio workloads
Build comprehensive evaluation methods to validate model performance
Collaborate with engineering teams to transition research into production

What you'll bring

Deep expertise in modern speech technologies (TTS, Speech LLMs, Voice Conversion/Cloning, Speech Translation, ASR, Audio Understanding)
Strong background in generative modelling for audio and speech
Publications at leading conferences
Track record of implementing research ideas from concept to production

You'll join a solid research team, including technical founders who've published work that's fundamentally shifted how the field thinks about efficient, large-scale foundation models. They're well-funded and generating strong revenue. Comp is on par with top AI labs, with base over $400k+ DOE plus a generous equity package.

The role is based in San Francisco, hybrid with 4 days a week in the office.

If you're excited about building the foundational models that will power the next generation of voice AI, we'd love to hear from you.

All applicants will receive a response.

Location:	San Francisco
Job type:	Permanent
Emp type:	Full-time
Salary type:	Annual
Salary:	negotiable
Job published:	30/04/2026
Job ID:	33251

ML Engineer, Speech Data

Ready to own the data pipeline powering the voice of the next generation of AI characters?

You'll be joining a well-funded startup building AI character technology, where speech is a core part of the product experience.

Think super natural conversations, handling interruptions, personality shifts and more!

You'll own the datasets that power their speech systems — from raw, messy audio through to clean, versioned training corpora that directly drive TTS and ASR model performance.

Your focus

Own the full data lifecycle — defining specs, auditing and curating large-scale audio and text corpora
Build automated quality metrics and dashboards across SNR, VAD, WER, speaker verification and safety, validated against listening tests
Train and deploy lightweight classifiers for noise detection, diarisation, language ID, and content moderation

What you'll bring

Deep experience working with speech and audio data at scale — 1M+ hours
Strong ML engineering skills in Python and PyTorch, including training and fine-tuning models like Whisper or Wav2Vec
Practical knowledge of audio processing — torchaudio, librosa, spectrograms, DSP basics
A solid understanding of audio quality metrics — MOS, WER, PESQ/STOI, SNR, speaker verification

Nice to have

Experience with Spark/Beam, Airflow, SQL or similar data engineering tools
Open-source contributions or publications in speech or audio ML
Background in denoising and enhancement, and how it affects downstream model quality

Remote, with a preference for European or overlapping timezones. Competitive compensation and equity.

Location:	Remote
Job type:	Permanent
Emp type:	Full-time
Salary type:	Annual
Salary:	negotiable
Job published:	27/03/2026
Job ID:	34412

Machine Learning Scientist

Looking to push the boundaries of generative AI for real-time interaction?

You'll be joining a well- funded startup working on multimodal AI where voice, vision, and language come together.

They're building generative models for natural conversational experiences that need to perform in real-time.

There's no limitations with resources here, they have plenty of compute for you to run experiments at scale. You'll be working alongside a well known open-source leader, as well as a very strong speech R&D team from leading companies.

Your mission

You'll be building and optimising diffusion or flow-matching models that power their speech and audio generation. This means developing production-ready architectures that can generate controllable, high-quality output at scale.

You'll own the full research-to-production pipeline - from architecture design and training through deployment and optimisation.

Your work will directly impact how millions of AI characters sound and interact.

Your focus

Design and train large-scale diffusion or flow-matching models
Develop novel architectures and training techniques to improve controllability and quality
Build evaluation systems to measure generation quality and model behaviour
Work from low-level performance optimisations to high-level model design

What you'll bring

Proven track record building diffusion models or flow-matching systems (this can be applied to other modalities)
Experience training large models (3B+ parameters) with distributed systems
Hands-on experience with streaming or distillation of diffusion models

Nice to have

Experience with audio or speech generation
Publications or open-source contributions in diffusion models or generative AI

Remote in Europe. Base salary is between €140-200K DOE (with some flex for the right person). Plus generous stock.

Location:	Remote
Job type:	Permanent
Emp type:	Full-time
Salary type:	Annual
Salary:	negotiable
Job published:	26/01/2026
Job ID:	34280

Senior Speech Scientist

Want to build speech AI that actually sounds human?

You'll be joining a well-funded speech AI startup with strong customer traction. They're building ultra-realistic voice technology that handles natural laughter, breathing, seamless language switching, and accurate pronunciation across languages and accents.

As their Senior Research Scientist, you'll work hands-on to expand their foundation models and push the boundaries of what's possible in speech AI: exploring multilingual capabilities, long-context generation, full-duplex modeling for natural conversations with interruptions, and novel architectures that balance speed with control.

What you'll do

Conduct research to advance their core speech models and extend product capabilities
Develop and experiment with new model architectures and training approaches
Work on large-scale model training and data systems
Collaborate with the team to take research from concept to deployed systems

What you'll bring

3+ years of experience in speech synthesis, audio generation, or generative modeling
Experience with audio generation using LLMs
Solid background in modern language model architectures
Proven ability to ship research into production systems
Experience training large-scale models

Nice to have

Published research in speech or generative modeling
Experience with real-time speech systems or multimodal models

Ideally in SF, but can also consider remote worldwide. Comp is up to $250K base DOE, plus equity.

Location:	San Francisco, CA
Job type:	Permanent
Emp type:	Full-time
Salary type:	Annual
Salary:	negotiable
Job published:	23/12/2025
Job ID:	34579

Your search query

Our use of cookies

Your search query

Send me similar jobs

Our use of cookies