Improve Windows LLM & GenAI user experience on NVIDIA RTX by working on feature and performance enhancements of OSS software, including but not limited to projects like GGML, Llama.cpp, Ollama, ONNX Runtime. * Work closely with internal engineering teams and external on solving local end-to-end LLM & Generative AI GPU deployment challenges, using techniques like quantization or distillation. * Conduct hands-on trainings, develop sample code and presentations to give good guidance on efficient end-to-end AI deployment targeting optimal runtime performance. * 5+years of professional experience in local GPU deployment, profiling and optimization. * Strong proficiency in C/C++, Python, software design, programming techniques. * Experience working with open-source LLM and GenAI software. * Experience with AI deployment on NPUs and ARM architectures.
mehr