Job Description
We are looking for an experienced L2 Engineer to operate and support high-performance AI infrastructure platforms, including NVIDIA GPU clusters, InfiniBand fabrics, and Kubernetes-based IaaS environments.
This role focuses on deep infrastructure expertise, ensuring performance, scalability, and reliability of the platform layer that powers AI workloads — without being responsible for the workloads themselves.
You will play a key role in bare metal lifecycle management, advanced InfiniBand troubleshooting, and platform stability, working closely with engineering teams to operate cutting-edge infrastructure at scale.
Key responsibilities:
- Troubleshoot and maintain InfiniBand fabrics, including performance tuning, link issues, and topology validation.
- Act as the escalation point for L1 for complex infrastructure and hardware issues.
- Own and maintain accurate infrastructure modeling, IPAM, and source...
Ready to Apply?
Take the next step in your AI career. Submit your application to Mirantis today.
Submit Application