Job Description

Responsibilities:

  • Install, configure, and maintain HPC systems , including compute, storage, and management nodes.
  • Manage HPC lab environments , ensuring availability and reliability for software development, testing, and validation activities.
  • Perform software installations, upgrades, and patch management for HPC environments.
  • Troubleshoot and resolve hardware and system-level issues in collaboration with vendors and internal teams.
  • Develop and maintain automation frameworks for system installation, configuration, and monitoring using Python Ansible , or similar tools.
  • Monitor system performance, analyze logs, and proactively identify and resolve performance bottlenecks.
  • Implement best practices for system security, backup, and recovery .
  • Maintain system ...

Ready to Apply?

Take the next step in your AI career. Submit your application to Hewlett Packard Enterprise today.

Submit Application