【小红书】大模型训练框架研发工程师/专家
全职社招引擎地点:上海 | 北京状态:招聘
工作描述
任职要求
任职资格: 1、至少熟练掌握Linux环境下 C/C++、Python语言之一,具备扎实的数据结构和算法基本功,擅长并行程序开发; 2、了解至少一种主流的深度学习框架(PyTorch/PaddlePaddle/TensorFlow等)的内部原理与实现,具备直接开发或二次开发经验; 3、对Megatron-LM/DeepSpeed等分布式框架及LLaMA-Factory/XTuner等大模型微调工具库有一定了解或相关开发经验; 4、具备模型训练调优分析经验,能够借助Nsight、nvprof等工具分析发现模型训练性能瓶颈,并进行针对性优化; 5、有良好的沟通表达及团队协作能力,有强烈的责任心和使命感。 加分项: 1、熟悉至少一种经典深度学习模型及其应用场景; 2、熟悉DP/TP/PP/ZeRO等分布式训练策略原理; 3、了解并行计算、网络通信、系统优化和集群硬件架构等相关知识; 3、熟悉NCCL/RDMA/IB/RoCE相关知识; 4、有高性能CUDAKernel相关研发经验; 5、有大模型训练调优分析经验。
工作职责
1、参与/负责设计实现深度学习训练框架, 包括高效的Dataloader、训练及微调工具链等AI基础设施,支持业务训练提效; 2、与公司各算法部门深度合作,参与/负责大语言模型、多模态大模型、计算机视觉、语音、自然语言处理等业务训练任务的优化提效; 3、分析各业务GPU利用率与饱和度等指标,结合业务场景持续优化训练框架能力,提升框架领先性。
包括英文材料
Linux+
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
C+
https://www.youtube.com/watch?v=87SH2Cn0s9A
https://www.youtube.com/watch?v=KJgsSFOSQv0
This course will give you a full introduction into all of the core concepts in the C programming language.
https://www.youtube.com/watch?v=PaPN51Mm5qQ
In this complete C programming course, Dr. Charles Severance (aka Dr. Chuck) will help you understand computer architecture and low-level programming with the help of the classic C Programming language book written by Brian Kernighan and Dennis Ritchie.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Python+
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
数据结构+
https://www.youtube.com/watch?v=8hly31xKli0
In this course you will learn about algorithms and data structures, two of the fundamental topics in computer science.
https://www.youtube.com/watch?v=B31LgI4Y4DQ
Learn about data structures in this comprehensive course. We will be implementing these data structures in C or C++.
https://www.youtube.com/watch?v=CBYHwZcbD-s
Data Structures and Algorithms full course tutorial java
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.w3schools.com/dsa/
深度学习+
https://d2l.ai/
Interactive deep learning book with code, math, and discussions.
PyTorch+
https://www.youtube.com/watch?v=V_xro1bcAuA
Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.
TensorFlow+
https://www.youtube.com/watch?v=tpCFfeUEGs8
Ready to learn the fundamentals of TensorFlow and deep learning with Python? Well, you’ve come to the right place.
https://www.youtube.com/watch?v=ZUKz4125WNI
This part continues right where part one left off so get that Google Colab window open and get ready to write plenty more TensorFlow code.
Megatron+
https://www.youtube.com/watch?v=hc0u4avAkuM
DeepSpeed+
https://www.youtube.com/watch?v=pDGI668pNg0
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
性能调优
CUDA+
https://www.youtube.com/watch?v=86FAWCzIe_4
Lean how to program with Nvidia CUDA and leverage GPUs for high-performance computing and deep learning.
PaddlePaddle
XTuner
Nsight
NVIDIA Visual Profiler