Job Description

Site Reliability Engineer - Level 1
(MSP – Frontline Operations)
The Site Reliability Engineer (SRE) is the first responder for MSP customers, ensuring incident management, triage, and resolution within SLA-defined timelines.
As the front-line escalation point, this role diagnoses, troubleshoots, and resolves cloud infrastructure issues, escalating complex incidents to specialized teams (Senior SREs, SecOps, FinOps, or Cloud Engineering) when required.
Key Responsibilities
Act as the first responder for MSP client incidents, providing rapid troubleshooting, diagnosis, and resolution within SLA timelines.
Triage incoming issues to determine whether they can be resolved directly or require escalation to specialized teams.
Monitor cloud environments using AWS CloudWatch, Datadog, and FreshService to detect performance, security, and availability issues proactively.
Maintain incident records, documenting resolution steps, escalation timelines, and root causes.
Co...

Ready to Apply?

Take the next step in your AI career. Submit your application to Dariel today.

Submit Application