Looking to push the boundaries of generative AI for real-time interaction?
You'll be joining a well-funded startup working on Multimodal AI where voice, vision, and language come together.
They're building generative models for natural conversational experiences that need to perform in real-time.
Your mission
You'll be building and optimising diffusion or flow-matching models that power their speech and audio generation.
This means developing production-ready architectures that can generate controllable, high-quality output at scale.
You'll own the full research-to-production pipeline - from architecture design and training through deployment and optimisation.
Your work will directly impact how millions of AI characters sound and interact.
Your focus
- Design and train large-scale diffusion or flow-matching models
- Develop novel architectures and training techniques to improve controllability and quality
- Build evaluation systems to measure generation quality and model behaviour
- Work from low-level performance optimisations to high-level model design
What you'll bring
- Proven track record building diffusion models or flow-matching systems
- Experience training large models (3B+ parameters) with distributed systems
Nice to have
- Experience with audio or speech generation
- Publications or open-source contributions in diffusion models or generative AI
Remote in Europe with competitive comp + stock.
| Location: | Remote |
|---|---|
| Job type: | Permanent |
| Emp type: | Full-time |
| Salary type: | Annual |
| Salary: | negotiable |
| Job published: | 26/01/2026 |
| Job ID: | 34280 |