【阿里云】阿里云智能-大模型推理优化专家/高级专家-北京/上海/深圳/杭州
全职社招技术类-开发地点:北京 | 深圳 | 杭州 | 上海状态:招聘
工作描述
任职要求
1. 主导大模型推理全链路优化:从计算图优化、算子融合到显存管理,构建面向Transformer架构的极致优化方案 2. 构建分布式推理引擎:设计模型并行、流水线并行、张量并行混合调度策略,支撑千卡集群的线性扩展能力 3. 计算机体系结构/算法优化双重功底:精通CUDA/Triton编程,能进行kernel级优化;熟悉TVM/MLIR/XLA等编译框架 4. 实战经验:具有LLaMA、GPT、GLM等百亿级模型优化经验,熟悉FlashAttention/PagedAttention等关键技术 5. 全栈优化:掌握从算法改进(MoE/混合专家系统)、框架调优(vLLM/DeepSpeed)到硬件协同设计的完整技术链条 6. 性能调优:能通过nsight systems等工具进行端到端性能分析,具备将理论算力转化为实际吞吐的杀手级能力
工作职责
针对 DeepSeek, 通义,LLaMA 等主流模型通过对模型优化,框架优化,算子优化提升大模型在单机和集群在不同GPU/NPU卡上性能和运行效率
包括英文材料
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
Transformer
推理引擎+
https://www.youtube.com/watch?v=_dvk75LEJ34
https://www.youtube.com/watch?v=XtT5i0ZeHHE
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.w3schools.com/dsa/
CUDA+
https://www.youtube.com/watch?v=86FAWCzIe_4
Lean how to program with Nvidia CUDA and leverage GPUs for high-performance computing and deep learning.
内核+
https://www.youtube.com/watch?v=C43VxGZ_ugU
I rummage around the Linux kernel source and try to understand what makes computers do what they do.
https://www.youtube.com/watch?v=HNIg3TXfdX8&list=PLrGN1Qi7t67V-9uXzj4VSQCffntfvn42v
Learn how to develop your very own kernel from scratch in this programming series!
https://www.youtube.com/watch?v=JDfo2Lc7iLU
Denshi goes over a simple explanation of what computer kernels are and how they work, alonside what makes the Linux kernel any special.
GPT+
https://www.youtube.com/watch?v=kCc8FmEb1nY
We build a Generatively Pretrained Transformer (GPT), following the paper "Attention is All You Need" and OpenAI's GPT-2 / GPT-3.
vLLM+
https://www.youtube.com/watch?v=Ju2FrqIrdx0
vLLM is a cutting-edge serving engine designed for large language models (LLMs), offering unparalleled performance and efficiency for AI-driven applications.
DeepSpeed+
https://www.youtube.com/watch?v=pDGI668pNg0
性能调优
Nsight