In this role, you'll own the post-training pipeline for our multimodal models end to end — from data strategy and reward modeling to preference optimization, distillation, and safety tuning — across image, editing, and video. * Own the full post-training pipeline end to end — from data curation and reward modeling through fine-tuning, preference optimization, distillation, safety tuning, evaluation, and deployment * You've owned post-training for a frontier generative model through release (SFT, preference optimization (DPO or RLHF), distillation, safety tuning) with measurable quality wins on human prefs or standard benchmarks - We'll discuss what this will look like for the role during our interview process.
mehr