Job Description

Job Details

Description

We are seeking a Site Reliability Engineer (SRE) with a strong software engineering background and a passion for building reliable, scalable, and highly observable systems. As an SRE, you will focus on improving service reliability through automation, reducing operational toil, implementing SLOs and error budgets, and partnering closely with software engineering teams to ensure smooth and stable production operations.

Essential Functions:


Reliability Engineer

  • Define, measure, and manage SLIs, SLOs, and error budgets across critical services.

  • Analyze system performance and identify opportunities to improve reliability, resilience, and scalability.

  • Lead reliability reviews and proactively prevent incidents before they impact customers.
  • Observability & Monitoring

  • Build and optimize monitoring, logging, and alerting systems.

  • ...
  • Ready to Apply?

    Take the next step in your AI career. Submit your application to DiscountMugs today.

    Submit Application