Job Description
Hardware: 8x RTX GB VRAM) + 8x H200 (1,536 GB VRAM) on Aether Cloud
Preferred: Former NVIDIA engineers or deep NVIDIA ecosystem experience
We need a battle-tested DevOps / GPU infrastructure engineer to execute a critical weekend migration. You will be working alongside our development team (which is actively building Agent v3 using Claude Code) to prepare the server infrastructure, create rollback snapshots, upgrade all drivers and firmware, and deploy the new agent architecture. This is a high-trust, high-impact engagement where your expertise will directly determine whether we can safely cut over to our next-generation system.
Scope of Work
1. Server Snapshot & Rollback Plan
Before any changes are made, create comprehensive recovery points so we can roll back to the current working state within minutes if the weekend migration cannot be completed.
- Full disk/volume snapshots of both GPU servers (RTX H200 nodes)
- Snapshot all running Docker containers, volumes,...
Preferred: Former NVIDIA engineers or deep NVIDIA ecosystem experience
We need a battle-tested DevOps / GPU infrastructure engineer to execute a critical weekend migration. You will be working alongside our development team (which is actively building Agent v3 using Claude Code) to prepare the server infrastructure, create rollback snapshots, upgrade all drivers and firmware, and deploy the new agent architecture. This is a high-trust, high-impact engagement where your expertise will directly determine whether we can safely cut over to our next-generation system.
Scope of Work
1. Server Snapshot & Rollback Plan
Before any changes are made, create comprehensive recovery points so we can roll back to the current working state within minutes if the weekend migration cannot be completed.
- Full disk/volume snapshots of both GPU servers (RTX H200 nodes)
- Snapshot all running Docker containers, volumes,...
Ready to Apply?
Take the next step in your AI career. Submit your application to Confidential today.
Submit Application