Job Description
Position Overview (Job Summary):
The role is for an HPC Engineer responsible for designing, deploying, managing, and optimizing an on-premises High Performance Computing (HPC) environment.
The environment includes SLURM-managed CPU and GPU clusters .
Strong emphasis on HPC architecture, Linux administration, job scheduling, and cluster operations .
Experience with parallel/distributed storage (WekaFS, Scality) is preferred but optional .
Primary Skills:
HPC Operations & Cluster Management (CPU & GPU)
SLURM Workload Manager (Mandatory) Install/configure/manage SLURM across multiple clusters
Partitions/queues, fairshare, job priority, scheduling policies
Upgrades, migrations, automation via API/integrations
Linux System Administration (RHEL focus) OS patching, hardening, tuning, package management
Troubleshooting & Performance Optimization Cluster health, node/job failures, bottlenecks, utilization optimization
Parallel Computing Knowledge M...
The role is for an HPC Engineer responsible for designing, deploying, managing, and optimizing an on-premises High Performance Computing (HPC) environment.
The environment includes SLURM-managed CPU and GPU clusters .
Strong emphasis on HPC architecture, Linux administration, job scheduling, and cluster operations .
Experience with parallel/distributed storage (WekaFS, Scality) is preferred but optional .
Primary Skills:
HPC Operations & Cluster Management (CPU & GPU)
SLURM Workload Manager (Mandatory) Install/configure/manage SLURM across multiple clusters
Partitions/queues, fairshare, job priority, scheduling policies
Upgrades, migrations, automation via API/integrations
Linux System Administration (RHEL focus) OS patching, hardening, tuning, package management
Troubleshooting & Performance Optimization Cluster health, node/job failures, bottlenecks, utilization optimization
Parallel Computing Knowledge M...
Ready to Apply?
Take the next step in your AI career. Submit your application to HCLTech today.
Submit Application