We're creating the generative that power how people make images and video—tools used by millions of creators, developers, and businesses worldwide. We build the systems to make multi-week/month long training possible, to orchestrate resources at scale, and at the same time efficiently, enabling the next breakthrough model. If you're obsessed with distributed systems at scale, infrastructure reliability, scalability, security, and continuous improvement, this team would be perfect for * Maintain research infrastructure, ensuring health, and optimizing components to extract peak performance from the system (both on application, and infrastructure side) * Identify and resolve performance bottlenecks and capacity hotspots through deep analysis of distributed systems at scale. * Build and evolve telemetry and monitoring systems to provide deep visibility into infrastructure ...
mehr