Job Description

AI Infrastructure Engineer- L3


The Role


The AI Infrastructure Engineer (L3) provides advanced engineering and architectural expertise for high‑performance AI and ML infrastructure. This role focuses on building, optimizing, and scaling GPU/accelerator environments and distributed systems for large‑scale training and inference workloads.

Competency Focus: High‑performance computing (HPC), distributed systems, Kubernetes, GPU orchestration, cloud optimization


Keywords: Nvidia GPU Infrastructure, Kubernetes, GPU Cluster Administrator, Infrastructure SME, RCA


Responsibilities:

  • Deploy, configure, and manage GPU and AI accelerator platforms (NVIDIA A100/H100/L40, AMD Instinct, TPU).
  • Troubleshot GPU hardware and software issues, including failures, thermal throttling, PCIe/NVLink topology, and driver conflicts.
  • Install, upgrade, and maintain...

Ready to Apply?

Take the next step in your AI career. Submit your application to HCLTech today.

Submit Application