New Job Opening: Applied Research Engineer - Synthetic Data in NYC or London

Job title:	Applied Research Engineer - Synthetic Data
Job type:	Permanent
Emp type:	Full-time
Industry:	Artificial Intelligence & Machine Learning
Salary type:	Annual
Salary:	negotiable
Location:	NYC or London
Job published:	15/04/2025
Job ID:	33152

Job Description

Shape the future of agentic AI through cutting-edge data strategy

Want to pioneer next-generation data techniques for advanced AI systems? This role combines frontier model research with practical implementation at one of Europe's most ambitious AI startups.

You'll join a rapidly growing AI Data team developing cutting-edge data-centric approaches that enhance LLMs, VLMs, and Action Models. This isn't just about collecting data – it's about transforming how AI systems learn and operate through synthetic generation, model distillation, and preference alignment.

Founded with a clear mission to push the boundaries of superintelligent agentic AI, this well-funded startup ($200M raised) is assembling world-class talent focused on both advancing capabilities and ensuring responsible development. Their approach is comprehensive – building proprietary technology from data to models, focusing on language, multimodal, and vision systems with superior performance and cost-effectiveness.

As an Applied Engineer focusing on Data Research, you'll develop sophisticated data strategies that directly impact frontier AI systems:

Generate and augment synthetic multimodal datasets for VQA, agent behaviours, and virtual navigation
Apply model distillation techniques to optimise large-scale models for edge deployment
Design evaluation frameworks to measure improvements across multiple domains
Lead research into aligning data with human and AI preferences
Collaborate with cross-functional teams to integrate data-driven solutions

This role offers rare access to significant compute resources, with a massive GPU cluster that enables cutting-edge work. You'll be joining at a pivotal stage where your contributions will shape core technology and direction.

Requirements:

Strong Python programming skills covering parallel computing, system design, and large-scale deployments
Experience developing multimodal data pipelines
Background in training and deploying LLMs, VLMs or PyTorch models
MSc or PhD in machine learning, computer vision, NLP, or related field
Deep understanding of training and evaluation paradigms for multimodal models
Effectiveness in fast-changing environments

Nice to have:

Experience with agent-specific data pipelines
Background in multimodal human annotation platforms
Document understanding/OCR expertise
Synthetic data generation experience (particularly multimodal)

You'll have flexibility to work from New York, London, or remotely within European or US East Coast time zones. For those based in cities with offices, hybrid arrangements are available.

Your package includes a highly competitive salary ($200,000-$350,000 depending on experience) plus significant equity with strong upside potential.

If you're passionate about advancing AI through innovative data approaches and want to make a lasting impact on agentic systems, we'd love to hear from you. All applicants will receive a response.

Questionnaire

Do you have extensive experience with transformer models?

Have you worked on model training and data challenges?

Do you have the right to work in the US, UK, or France?

Apply with indeed

Upload Resume | Portfolio

File types (doc, docx, pdf, rtf, png, jpeg, jpg, bmp, jng, ppt, pptx, csv, gif) size up to 5MB

First name

Last name

Phone number

Location

By checking this box, you agree to our Terms of Service

Job Description

Questionnaire

Our use of cookies