Job Description
Want to scale AI Agents through innovative ML infrastructure?
A pioneering AI company is looking for an experienced Engineer to revolutionize how their agent technology is deployed and served. While others follow conventional paths, they're creating new approaches to agent-specific serving challenges.
Short term, you'll focus on cloud deployment and performance optimization, enhancing their current infrastructure. Long term, you'll help design and build proprietary frameworks that challenge the status quo of model serving.
What You'll Do:
- Architect and improve cloud-based deployment systems
- Create efficient solutions for concurrent model serving
- Lead the transition from Open Source and third-party tools to custom frameworks
- Drive innovation in model compression and performance
- Design new approaches to large-scale model deployment
You Should Have:
- Advanced degree in Computer Science or related field
- Strong background in MLOps or Model Inference
- Experience scaling AI models in production
- Python expertise and interest in systems programming
- Track record of solving complex deployment challenges
Bonus Points For:
- LLM model serving such as vLLM or similar
Building custom serving solutions - Knowledge of GPU optimization
- Experience with large language models
- Background in high-performance computing
You'll join a world-class team pushing the boundaries of what's possible with AI agents. Work remotely (EU/US East Coast) or hybrid from their London office.
This role is perfect for someone who:
- Enjoys tackling unprecedented technical challenges
- Thinks creatively about infrastructure problems
- Values practical solutions while innovating for the future
- Thrives in fast-paced, research-driven environments
Compensation is highly competitive, reflecting the senior nature of the role.
Ready to help define the future of AI infrastructure? Contact Marc at Techire AI to learn more. All applications will receive a response.