Post-training is where a foundation model becomes a product. In this role, you'll own the post-training pipeline for our multimodal models end to end — from data strategy and reward modeling to preference optimization, distillation, and safety tuning — across image, editing, and video. We're looking for someone who has shipped post-training for a frontier model before and wants to do it again. * Own the full post-training pipeline end to end — from data curation and reward modeling through fine-tuning, preference optimization, distillation, safety tuning, evaluation, and deployment * Advance techniques across the post-training stack: SFT, RLHF, RLAIF, DPO, preference learning, and reward modeling to align models with human intent and aesthetic judgment * You've owned post-training for a frontier generative model through release (SFT, preference optimization (DPO or RLHF), ...
mehr