You'll architect and implement high-performance inference stacks, optimize GPU kernels and compilers, drive industry benchmarks, and scale workloads across multi-GPU, multi-node, and multi-cloud environments. * Develop, optimize, and benchmark GPU kernels (hand-tuned and compiler-generated) using techniques such as fusion, autotuning, and memory/layout optimization; build and extend high-level DSLs and compiler infrastructure to boost kernel developer productivity while approaching peak hardware utilization. * Architect the scheduling and orchestration of containerized large-scale inference deployments on GPU clusters across clouds. * Experience with cloud platforms (AWS/GCP/Azure), infrastructure as code, CI/CD, and production observability.
mehr