Job Description
Site Reliability Engineer - Fully Remote
What We're Looking For
We're looking for someone who enjoys solving complex operational challenges through engineering rather than manual intervention. You'll be proactive, collaborative, and passionate about improving reliability through automation and continuous improvement.
If you're excited about building resilient cloud platforms and making a measurable impact on service reliability, we'd love to hear from you.
Key Responsibilities
Incident Management & Operations
Participate in a 24/7 on-call rota as a primary or escalation point
Lead or support major incident response, including triage, mitigation, and resolution.
Coordinate with Engineering, Infrastructure, Security, and Product teams during incidents.
Develop, maintain, and continuously improve operational runbooks and playbooks.
Conduct blameless post-incident reviews and drive follow-up improvements.Monitoring & Alerting
What We're Looking For
We're looking for someone who enjoys solving complex operational challenges through engineering rather than manual intervention. You'll be proactive, collaborative, and passionate about improving reliability through automation and continuous improvement.
If you're excited about building resilient cloud platforms and making a measurable impact on service reliability, we'd love to hear from you.
Key Responsibilities
Incident Management & Operations
Participate in a 24/7 on-call rota as a primary or escalation point
Lead or support major incident response, including triage, mitigation, and resolution.
Coordinate with Engineering, Infrastructure, Security, and Product teams during incidents.
Develop, maintain, and continuously improve operational runbooks and playbooks.
Conduct blameless post-incident reviews and drive follow-up improvements.Monitoring & Alerting
Ready to Apply?
Take the next step in your AI career. Submit your application to Spectrum IT Recruitment today.
Submit Application