Contribute to the design and implementation of CUDA Core Libraries in C++ and/or Python, including parallel algorithms and language idiomatic exposure of core CUDA concepts. * Design and optimize GPU algorithms and APIs, from high-level interfaces down to low-level performance tuning involving memory, parallelism, and synchronization. * Collaborate with experienced ; participate in design reviews, code reviews, and open-source-style workflows. * Strong programming skills in C++, Python, or both, with interest in systems-level development (performance, memory, concurrency, API design). * Clear written communication for design discussions and documentation. * Comfort navigating and debugging large, multi-language codebases (C++, Python, CMake, GitHub Actions CI systems) with demonstrate
mehr