Staff Site Reliability Engineer


New York
Permanent
USD200000 - USD300000
Development and Engineering​
PR/557090_1755180912
Staff Site Reliability Engineer

Senior / Staff Site Reliability Engineer (SRE) - AI Infrastructure

Join a fast-moving team building cutting-edge GPU supercomputers for the world's leading AI labs, governments, and enterprises. We're a small, high-performing group focused on delivering a world-class supercomputing experience, and we're looking for engineers who thrive in dynamic environments and care deeply about their craft.

What You'll Do:

  • Deploy and manage clusters of 1,000+ GPUs using custom automation tools.
  • Validate and optimize compute, storage, and networking infrastructure.
  • Migrate petabytes of data from public cloud platforms to local storage efficiently.
  • Troubleshoot complex issues across the stack-from hardware quirks to software bottlenecks.
  • Build internal tools to improve deployment speed and cluster reliability.
  • Participate in an on-call rotation (up to one week per month).

What We're Looking For:

  • 2+ years of experience in SRE, DevOps, Sysadmin, or HPC engineering.
  • Strong coding skills in Go, Python, and Bash.
  • Experience with Kubernetes and/or SLURM.
  • Familiarity with Ansible, Terraform, and other IAC tools.
  • Excellent communication skills and a customer-first mindset.
  • Ability to navigate ambiguity and adapt to the fast-evolving AI landscape.

Bonus Points For:

  • Operating AI workloads at 1000+ GPU scale.
  • Building multi-tenant, hyperscale Kubernetes services.
  • Hands-on datacenter experience with bare metal hardware.
  • Managing InfiniBand or RoCE networks.
  • Deploying petabyte-scale storage systems (DDN, VAST, Weka, Ceph, LUSTRE, etc.).

We value ownership, effectiveness, and a growth mindset. If you're passionate about infrastructure and want to help power the future of AI, we'd love to hear from you.

FAQs

Congratulations, we understand that taking the time to apply is a big step. When you apply, your details go directly to the consultant who is sourcing talent. Due to demand, we may not get back to all applicants that have applied. However, we always keep your resume and details on file so when we see similar roles or see skillsets that drive growth in organizations, we will always reach out to discuss opportunities.

Yes. Even if this role isn’t a perfect match, applying allows us to understand your expertise and ambitions, ensuring you're on our radar for the right opportunity when it arises.

We also work in several ways, firstly we advertise our roles available on our site, however, often due to confidentiality we may not post all. We also work with clients who are more focused on skills and understanding what is required to future-proof their business. 

That's why we recommend registering your resume so you can be considered for roles that have yet to be created. 

Yes, we help with resume and interview preparation. From customized support on how to optimize your resume to interview preparation and compensation negotiations, we advocate for you throughout your next career move.

Handpicked roles for you