Job title:	Senior Inference Engineer
Job type:	Permanent
Emp type:	Full-time
Industry:	Conversational AI
Skills:	Inference Model Serving vLLM GPU optimisation
Salary type:	Annual
Salary:	negotiable
Location:	Remote
Job published:	03/03/2026
Job ID:	35239

Job Description

Want to own the inference layer behind millions of real-world voice AI interactions every day?

You’ll join a profitable, founder-led enterprise conversational AI company powering billions of interactions annually across 30+ languages. Their systems sit behind major global brands and handle millions of customer conversations daily.

They’re now moving toward end-to-end multimodal and speech-to-speech architectures. You’ll own the inference stack powering both their multimodal speech-text LLM and their text reasoning LLM.

This goes well beyond tuning configs.

You will:

• Optimise production inference across A10, A100 and H100 GPUs
• Own scheduler design, KV cache allocation and batching logic
• Build serving systems tailored to multimodal audio-text workloads
• Support agentic, multi-step reasoning under real latency constraints
• Profile kernel-level bottlenecks and fix them properly

You’ve modified inference framework internals before, not just used them. You’re comfortable in Python and C++, and you’re happy diving into CUDA graphs, memory bandwidth limits or custom kernels when required.

This platform processes over 2 million interactions per day. Latency, throughput and cost are production realities, not lab metrics.

Package: €150,000 base + bonus + stock options
Location: Remote within Europe

If you want full ownership of inference performance at real enterprise scale, let’s talk.

All applicants will receive a response.

Location:	Palo Alto
Job type:	Permanent
Emp type:	Full-time
Salary type:	Annual
Salary:	negotiable
Job published:	20/11/2025
Job ID:	33284

Job Description

Our use of cookies