Job Description

Job Description

We are looking for an experienced L2 Engineer to operate and support high-performance AI infrastructure platforms, including NVIDIA GPU clusters, InfiniBand fabrics, and Kubernetes-based IaaS environments.

This role focuses on deep infrastructure expertise, ensuring performance, scalability, and reliability of the platform layer that powers AI workloads — without being responsible for the workloads themselves.

You will play a key role in bare metal lifecycle management, advanced InfiniBand troubleshooting, and platform stability, working closely with engineering teams to operate cutting-edge infrastructure at scale.

Key responsibilities:

  • Troubleshoot and maintain InfiniBand fabrics, including performance tuning, link issues, and topology validation.
  • Act as the escalation point for L1 for complex infrastructure and hardware issues.
  • Own and maintain accurate infrastructure modeling, IPAM, and source...

Ready to Apply?

Take the next step in your AI career. Submit your application to Mirantis today.

Submit Application