HPC Software: Familiarity with MPI, job schedulers (e.g. Slurm), containerization, AI frameworks (TensorFlow, PyTorch) and other systems management tools (such as Open Nebula or similar ones). * Storage: Familiarity with Parallel file systems (Lustre, GPFS), NVMe, tiered storage, and data lifecycle management. * Global Ecosystem Management: Ability to interact effectively with leading vendors (HPE, Dell, Lenovo, Supermicro, NVIDIA, AMD) and hyperscaler integration for hybrid HPC/AI deployments. * Cloud & Hybrid HPC: Familiarity with the Integration of on-prem HPC clusters with cloud AI services for burst capacity and model training.
mehr