We’re looking for an MLE to build and scale distributed reinforcement learning systems for model training. You’ll deploy elastic environment microservices, design reward systems and optimize multi-node and multi-datacenter training pipelines.
Responsibilities:
- Designing and implementing RL pipelines from reward modeling to policy optimization
- Optimizing RL training stability and sample efficiency for large models
- Verifying numerical correctness across inference and training
- Performance engineering on trainer-inference communication
- Validating methods from recent publications
Qualifications:
- Hands-on experience with reinforcement learning in production systems
- Deep understanding of policy-space methods (GRPO, PPO, etc.)
- Experience profiling distributed systems
Preferred:
- History of OSS contributions
- Knowledge of TorchTitan and SGLang or vLLM