In this role, you'll own the post-training pipeline for our multimodal models end to end — from data strategy and reward modeling to preference optimization, distillation, and safety tuning — across image, editing, and video. You'll drive measurable gains in quality, build the infrastructure that lets the whole research team iterate fast, and push the state of the art in what it means to align a generative model to human intent. * Own the full post-training pipeline end to end — from data curation and reward modeling through fine-tuning, preference optimization, distillation, safety tuning, evaluation, and deployment * Identify quality and alignment gaps through rigorous evaluation, then close them through targeted research and engineering * You've owned post-training for a frontier generative model through release (SFT, preference optimization (DPO or RLHF), distillation, safety tuning) with ...
mehr