The Data Center Operations Engineer plays a critical role in maintaining and expanding Cadence's global data center infrastructure, with a strong focus on Linux-based systems and GPU server environments. * Deploy and maintain Linux-based compute, GPU, and storage infrastructure across data center environments, ensuring high availability and consistent performance. * Configure and bring up InfiniBand fabric and GPU clusters, including switch configuration, subnet management, and end-to-end validation testing. * Monitor the data center environment using established alerting frameworks, escalate issues appropriately, and drive timely service restoration in line with SLAs. * Collaborate with global infrastructure and operations teams to support data center builds, migrations, refresh programmes, and process improvement initiatives.
mehr