Job Description
Ready to pioneer deep generative modeling for real-time video synthesis?
Join a pioneering startup developing the foundation layer for the next big AI unlock, video behaviour and naturalness of conversation in video generation. This in turn will change the game for embodied agents with natural behaviours, real-time expression, and conversational intelligence that goes far beyond current avatar technology.
This Research Scientist role focuses on advancing embodied AI through groundbreaking generative modelling research. While existing solutions rely on looped animations with basic lip-sync, this company is building behaviour driven models that drive authentic, real-time interactions capable of natural conversation flow, interruption handling, and emotional expression.
Founded 18 months ago by an exceptional team where 7 out of 12 members hold AI PhDs, they're solving fundamental challenges in visual generation for embodied intelligence. Their beta platform already demonstrates sophisticated real-time video generation systems with advanced generative models creating natural facial expressions and body movements.
The company is building foundational generative technology that creates dynamic visual content from multimodal inputs, developing systems that generate realistic human-like expressions and movements. Their research sits at the intersection of computer vision, deep generative modeling, and real-time video synthesis.
Your focus:
Conduct cutting-edge research in deep generative modeling for vision and video generation
Develop sophisticated generative models for facial expressions, body dynamics, and full avatar synthesis
Create novel architectures using diffusion models and flow matching for video generation
Build real-time generative pipelines for dynamic visual content creation
Advance state-of-the-art techniques in multimodal generative modeling
Collaborate with engineering to productionise generative models into real-time systems
Publish findings at top-tier conferences while deploying in real-world applications
Technical challenges: You'll work with cutting-edge techniques including diffusion models, flow matching, and advanced generative architectures for video synthesis. The focus is on creating high-quality, temporally consistent video generation that can power natural embodied agents, emphasising real-time performance and visual fidelity.
Requirements:
PhD in Computer Vision, Machine Learning, or related field
Strong publication record at top conferences (CVPR, NeurIPS, ICCV, ECCV, ICML, ICLR, SIGGRAPH)
Recent video generation or embodied agent/avatar research publications within the past 2 years (essential)
Expertise in flow matching and diffusion models
Experience with one or more: dyadic conversational avatars, behaviour modelling via LLMs, real-time multimodal generation
PyTorch proficiency and large-scale training experience
Nice to have:
Industry experience deploying generative models in real-time applications
Background in 3D generation, neural rendering, or Gaussian splatting
Experience with video generation frameworks and temporal consistency methods
Environment: You'll join a distributed team working primarily in Pacific Time zones, collaborating with specialists in generative modeling, computer vision, and video synthesis. The culture emphasises high ownership, velocity with purpose, and collaborative problem-solving in a fast-moving research environment.
Package:
Competitive salary $200k up to $300k base (flexible based on experience)
Meaningful equity package
Comprehensive healthcare (90% covered)
Unlimited PTO
Fully remote work with regular team offsites
Life and disability coverage
Location: Fully remote position with preference for Pacific Time alignment.
If you're excited about conducting pioneering research in deep generative modeling for vision while shaping the future of embodied agents, this offers an exceptional opportunity to work on genuinely transformative technology.
Ready to help create the next generation of visual AI?
Contact Marc Powell at Techire AI. All applicants will receive a response.
Questionnaire
Do you have recent papers on Video generation, Avatars, Embodied Agents or Multimodal generation? Must be in the past 2-3 years and at top conferences NeurIPS, ICML, CVPR, ICCV, ECCV, ICLR etc) Please select Yes No