HPC Architecture: Capability to design complex end-to-end HPC and AI architectures, including servers, storage, networking and management software, in collaboration with and leading a team of in the different areas. * Servers: Deep knowledge of CPU/GPU-based clusters, high-density servers, and advanced interconnects (InfiniBand, Ethernet) design, including different cooling technologies and how to integrate them. * AI Platforms: Deep knowledge of AI accelerators (NVIDIA GPUs, AMD Instinct…), including different cooling technologies and how to integrate them. * HPC Software: Familiarity with MPI, job schedulers (e.g. Slurm), containerization, AI frameworks (TensorFlow, PyTorch) and other systems management tools (such as Open Nebula or similar ones). * Networking: Deep Knowledge of optimized network topologies for HPC clusters including InfiniBand, high-speed Ethernet, RDMA.
more