You'll shape training objectives, architectures, data strategies, and systems behind our joint image, video, and audio foundation models, with a direct line from your research to products used by millions. * Lead large-scale pretraining experiments for our multimodal (image, video, audio) foundation models (architecture, objective functions, scaling strategies) * Own architectural calls that move the : attention patterns, modulation schemes, loss formulations, tokenization strategies
mehr