Job Description
Shape the future of agentic AI through cutting-edge data strategy
Want to pioneer next-generation data techniques for advanced AI systems? This role combines frontier model research with practical implementation at one of Europe's most ambitious AI startups.
You'll join a rapidly growing AI Data team developing cutting-edge data-centric approaches that enhance LLMs, VLMs, and Action Models. This isn't just about collecting data – it's about transforming how AI systems learn and operate through synthetic generation, model distillation, and preference alignment.
Founded with a clear mission to push the boundaries of superintelligent agentic AI, this well-funded startup ($200M raised) is assembling world-class talent focused on both advancing capabilities and ensuring responsible development. Their approach is comprehensive – building proprietary technology from data to models, focusing on language, multimodal, and vision systems with superior performance and cost-effectiveness.
As an Applied Engineer focusing on Data Research, you'll develop sophisticated data strategies that directly impact frontier AI systems:
Generate and augment synthetic multimodal datasets for VQA, agent behaviours, and virtual navigation
Apply model distillation techniques to optimise large-scale models for edge deployment
Design evaluation frameworks to measure improvements across multiple domains
Lead research into aligning data with human and AI preferences
Collaborate with cross-functional teams to integrate data-driven solutions
This role offers rare access to significant compute resources, with a massive GPU cluster that enables cutting-edge work. You'll be joining at a pivotal stage where your contributions will shape core technology and direction.
Requirements:
Strong Python programming skills covering parallel computing, system design, and large-scale deployments
Experience developing multimodal data pipelines
Background in training and deploying LLMs, VLMs or PyTorch models
MSc or PhD in machine learning, computer vision, NLP, or related field
Deep understanding of training and evaluation paradigms for multimodal models
Effectiveness in fast-changing environments
Nice to have:
Experience with agent-specific data pipelines
Background in multimodal human annotation platforms
Document understanding/OCR expertise
Synthetic data generation experience (particularly multimodal)
You'll have flexibility to work from New York, London, or remotely within European or US East Coast time zones. For those based in cities with offices, hybrid arrangements are available.
Your package includes a highly competitive salary ($200,000-$350,000 depending on experience) plus significant equity with strong upside potential.
If you're passionate about advancing AI through innovative data approaches and want to make a lasting impact on agentic systems, we'd love to hear from you. All applicants will receive a response.
Questionnaire
Do you have extensive experience with transformer models? Please select Yes No
Have you worked on model training and data challenges? Please select Yes No
Do you have the right to work in the US, UK, or France? Please select Yes No