Job Description

Responsibilities

  • Design and implement core platform components with an emphasis on reliability, scalability, and operational safety.
  • Build and maintain cloud-native infrastructure, including networking, compute, and service orchestration.
  • Own deployment workflows, warm-up processes, rollout strategies, and rollback mechanisms for production environments.
  • Establish and maintain platform standards for monitoring, alerting, logging, and incident response.
  • Develop and enforce guardrails to reduce operational risk from abuse, misconfiguration, or traffic anomalies.
  • Define and track SLIs, SLOs, and KPIs related to uptime, latency, and platform health.
  • Use AI-assisted engineering tools (Cursor, Claude Code, etc.) to improve development velocity and operational insight.
  • Document architecture, operational procedures, and known failure scenarios.
  • Requirements

  • 5+ years of experience in platform engineering, in...
  • Ready to Apply?

    Take the next step in your AI career. Submit your application to CloudLinux today.

    Submit Application