Job Description
Responsibilities
:
Designing and implementing core platform components with an emphasis on reliability, scalability, and operational safety.Building and maintaining cloud-native infrastructure, including networking, compute, and service orchestration.Owning deployment workflows, warm-up processes, rollout strategies, and rollback mechanisms for production environments.Establishing and maintaining platform standards for monitoring, alerting, logging, and incident response.Developing and enforcing guardrails to reduce operational risk from abuse, misconfiguration, or traffic anomalies. Defining and tracking SLIs, SLOs, and KPIs related to uptime, latency, and platform health. Using AI-assisted engineering tools (Cursor, Claude Code, etc.) to improve development velocity and operational insight.Documenting architecture, operational procedures, and known failure scenarios.Requirements
5+ years of...
Ready to Apply?
Take the next step in your AI career. Submit your application to CloudLinux today.
Submit Application