Job Description
Job Description
Site Reliability Engineer (SRE) responsible for ensuring the reliability, availability, and performance of large-scale, cloud‑native services operating within a Google Cloud Platform (GCP) environment. This role partners closely with engineering teams to design resilient systems, define and measure service reliability using SLOs and SLIs, and manage error budgets to balance innovation with stability. The SRE leads incident management efforts, including on‑call response, incident coordination, root cause analysis, and post‑incident reviews, with a strong focus on reducing mean time to recovery and preventing recurrence through automation and engineering improvements. The ideal candidate brings deep experience in GCP services, infrastructure as code, monitoring and observability, and a calm, structured approach to operating high‑availability systems under pressure.We are a company committed to creating diverse and inclusive environments wher...
Ready to Apply?
Take the next step in your AI career. Submit your application to Insight Global today.
Submit Application