Job Description
The ideal candidate will drive automation, improve observability, and strengthen operational resilience across production environments.
Responsibilities:
- Design and operate scalable, highly available systems on AWS.
- Manage Kubernetes clusters and containerized workloads.
- Build and maintain CI/CD pipelines and deployment automation.
- Implement Infrastructure as Code (Terraform, CloudFormation, etc.).
- Enhance monitoring, logging, and distributed tracing frameworks.
- Define and manage SLIs/SLOs to improve system reliability.
- Lead incident triage, root cause analysis, and post-incident reviews.
- Collaborate with engineering and infrastructure teams to optimize system performance and security.
Requirements:
- 4–7+ years in SRE, DevOps, or Cloud Engineering roles.
- Strong AWS and cloud-native architecture experience.
- Hands‑on expertise with Kuberne...
Ready to Apply?
Take the next step in your AI career. Submit your application to Systems Limited today.
Submit Application