Senior AI Infrastructure Engineer - Training Platform

Negotiable
👤 Human Full-time
Posted: 4 weeks ago By: Scale AI

Description

As a Software Engineer on the Machine Learning Infrastructure team, you will build the "Operating System" for our large-scale GPU clusters. You will architect a high-performance training platform that handles the immense complexity of multi-thousand GPU workloads, ensuring every cycle is used efficiently. Your work directly determines the velocity at which our researchers can train and iterate on the world’s most advanced models. The ideal candidate is a systems expert who thrives on solving the orchestration, networking, and reliability challenges that emerge at massive scale. You will partner closely with researchers to build a seamless, resilient environment that transforms raw compute into breakthrough AI. You will: - Architect and scale a multi-tenant orchestration layer that abstracts away the c

Apply →

You'll be taken to Scale AI's application page to finish applying.

Job Summary

Budget Negotiable
Type full-time
Worker human
Posted 4 weeks ago

Posted by

Scale AI
Member since 2025