Headquartered in Freiburg, Germany with a growing presence in San Francisco, we're scaling fast while staying true to what makes us different: research excellence, open science, and building technology that expands human creativity. You'll drive measurable gains in quality, build the infrastructure that lets the whole research team iterate fast, and push the state of the art in what it means to align a generative model to human intent. * Advance techniques across the post-training stack: SFT, RLHF, RLAIF, DPO, preference learning, and reward modeling to align models with human intent and aesthetic judgment * You've owned post-training for a frontier generative model through release (SFT, preference optimization (DPO or RLHF), distillation, safety tuning) with measurable quality wins on human prefs or standard benchmarks
mehr