Job Description
Drive reliability and operational maturity for Kubernetes workloads on GKE through safe rollout patterns, high-signal observability, resilient IaC, and effective incident response. Collaborate with developers to harden CI/CD pipelines and address infrastructure concerns within application code.
Key responsibilities:
Design and maintain resilient deployment patterns (blue-green, canary, GitOps syncs) across services.Instrument and optimize logs, metrics, traces, and alerts to reduce noise and improve signal.Review backend code (e.g., Django, Node.js, Go, Java) with a focus on infra touchpoints like database usage, timeouts, error handling, and memory consumption.Tune and troubleshoot GKE workloads, HPA configs, network policies, and node pool strategies.Improve or author Terraform modules for infrastructure resources (e.g., VPC, CloudSQL, Secrets, Pub/Sub).Diagnose production issues from ...
Ready to Apply?
Take the next step in your AI career. Submit your application to Orion Innovation today.
Submit Application