Job Description
Job Description – HPC Engineer (HPC with SLURM, CPU & GPU Clusters)
Position Overview
We are seeking a skilled HPC Engineer to design, deploy, manage, and optimize our on premises High Performance Computing (HPC) environment, consisting of SLURM-managed CPU and GPU clusters. The ideal candidate will have a strong understanding of HPC architecture, Linux systems, job scheduling, and cluster operations. Experience with parallel file systems and enterprise storage solutions such as Weka FS or Scality is preferred but optional.
Key Responsibilities
1. HPC Infrastructure & Operations
• Manage day to day operations of on prem HPC clusters including CPU and GPU compute nodes.
• Monitor cluster health, performance, and utilization, ensuring high availability and efficiency.
• Implement and maintain best practices for HPC operations, user management, and resource administration.
• Troubleshoot cluster related issues including networking, node failures, job failures, and ...
Position Overview
We are seeking a skilled HPC Engineer to design, deploy, manage, and optimize our on premises High Performance Computing (HPC) environment, consisting of SLURM-managed CPU and GPU clusters. The ideal candidate will have a strong understanding of HPC architecture, Linux systems, job scheduling, and cluster operations. Experience with parallel file systems and enterprise storage solutions such as Weka FS or Scality is preferred but optional.
Key Responsibilities
1. HPC Infrastructure & Operations
• Manage day to day operations of on prem HPC clusters including CPU and GPU compute nodes.
• Monitor cluster health, performance, and utilization, ensuring high availability and efficiency.
• Implement and maintain best practices for HPC operations, user management, and resource administration.
• Troubleshoot cluster related issues including networking, node failures, job failures, and ...
Ready to Apply?
Take the next step in your AI career. Submit your application to SISL Global today.
Submit Application