We’re looking for an MLE to build and scale distributed reinforcement learning systems for model training. You’ll deploy elastic environment microservices, design reward systems and optimize multi-node and multi-datacenter training pipelines.

Responsibilities:

  • Designing and implementing RL pipelines from reward modeling to policy optimization
  • Optimizing RL training stability and sample efficiency for large models
  • Verifying numerical correctness across inference and training
  • Performance engineering on trainer-inference communication
  • Validating methods from recent publications

Qualifications:

  • Hands-on experience with reinforcement learning in production systems
  • Deep understanding of policy-space methods (GRPO, PPO, etc.)
  • Experience profiling distributed systems

Preferred:

  • History of OSS contributions
  • Knowledge of TorchTitan and SGLang or vLLM

ARTIFICIAL INTELLIGENCE MADE HUMAN

NODES

THE AI ACCELERATOR COMPANY

NODES