Job title: Research Scientist - Embodied AI
Job type: Permanent
Emp type: Full-time
Industry: AI Agents
Salary type: Annual
Salary: negotiable
Location: Remote
Job published: 18/06/2025
Job ID: 33449

Job Description

Ready to create foundation models for Embodied AI?

Join a pioneering startup developing the foundation layer for the next big AI unlock, naturalness of conversation, from the speech visual element, the interruptions, turn taking. This in turn will change the game for embodied agents with natural behaviours, real-time expression, and conversational intelligence that goes far beyond current avatar technology.

This Research Scientist role focuses on advancing embodied AI through groundbreaking research on Audio to Video models. While existing solutions rely on looped animations with basic lip-sync, this company is building behaviour driven models that drive authentic, real-time interactions capable of natural conversation flow, interruption handling, and emotional expression.

Founded 18 months ago by an exceptional team where 7 out of 12 members hold AI PhDs, they're solving fundamental challenges in Embodied Intelligence. Their beta platform already demonstrates sophisticated real-time avatar systems with proprietary voice models and behaviour engines working in harmony.

The company is building foundational technology that learn from two-way video interactions, creating systems that understand and respond to both verbal and non-verbal cues. Their research sits at the intersection of computer vision, conversational AI, and real-time generation.

Your focus:

  • Conduct cutting-edge research in avatar modelling, behaviour generation, and style transfer
  • Develop sophisticated facial and body dynamics systems for expressive avatars
  • Create conversational AI systems that drive natural avatar behaviour through LLMs
  • Build real-time multimodal generation pipelines integrating visual, audio, and text
  • Contribute to Behaviour model development for authentic interaction patterns
  • Collaborate with engineering to productionise research into real-time systems
  • Publish findings at top-tier conferences while deploying in real-world applications

Technical challenges: You'll work with cutting-edge techniques including diffusion models, flow matching, and Gaussian splatting. The focus is on dyadic conversational avatar development and natural behaviour modelling, emphasising authentic real-time interaction over static visual perfection.

Requirements:

  • PhD in Computer Vision, Machine Learning, or related field
  • Strong publication record at top conferences (CVPR, NeurIPS, ICCV, ECCV, ICML, ICLR, SIGGRAPH, etc). Recent avatar research publications within the past 2 years (essential)
  • Expertise in flow matching and diffusion models
  • Experience with one or more: conversational avatars or behaviour modelling or real-time multimodal generation
  • PyTorch proficiency and large-scale training experience

Nice to have:

  • Industry experience deploying ML models in real-time applications
  • Voice research publications
  • Background in interactive systems or conversational AI

Environment: You'll join a distributed team working primarily in Pacific Time zones, collaborating with specialists in avatar development, voice research, and behaviour modelling. The culture emphasises high ownership, velocity with purpose, and collaborative problem-solving in a fast-moving research environment.

Package:

  • Competitive salary, $200k- $300k base (based on experience)
  • Meaningful equity package
  • Comprehensive healthcare (90% covered)
  • Unlimited PTO
  • Fully remote work with regular team offsites
  • Life insurance and disability coverage

Location: Fully remote position, globally, with preference for Pacific Time alignment.

If you're excited about conducting pioneering research in the next challenge of embodied intelligence while shaping the future of human-AI interaction, this offers an exceptional opportunity to work on genuinely transformative technology.

Ready to help create AI that feels present, not just functional?

Contact Marc Powell at Techire AI. All applicants will receive a response.

Questionnaire

Apply with indeed
File types (doc, docx, pdf, rtf, png, jpeg, jpg, bmp, jng, ppt, pptx, csv, gif) size up to 5MB