Job Description

Responsibilities

:
  • Designing and implementing core platform components with an emphasis on reliability, scalability, and operational safety.
  • Building and maintaining cloud-native infrastructure, including networking, compute, and service orchestration.
  • Owning deployment workflows, warm-up processes, rollout strategies, and rollback mechanisms for production environments.
  • Establishing and maintaining platform standards for monitoring, alerting, logging, and incident response.
  • Developing and enforcing guardrails to reduce operational risk from abuse, misconfiguration, or traffic anomalies.
  • Defining and tracking SLIs, SLOs, and KPIs related to uptime, latency, and platform health.
  • Using AI-assisted engineering tools (Cursor, Claude Code, etc.) to improve development velocity and operational insight.
  • Documenting architecture, operational procedures, and known failure scenarios.
  • Requirements

  • 5+ years of...
  • Ready to Apply?

    Take the next step in your AI career. Submit your application to CloudLinux today.

    Submit Application