Job Description
Job Description
:
Must Have:
Looking for a Site Reliability Engineering (SRE) who has had prior developer and architecture experience in developing Java based enterprise applications and seasoned in handling operational/production support issues.Adopt SRE best practices: Work with dev teams to define Non-Functional Requirements such as reliability, performance, scalability, application logging for observability, etc. Defi ne SLI/SLOs, Error Budgets, Automation focusIncident Management: Lead the response to production issues, ranging from identifying and troubleshooting problems to implementing immediate fixes. Ensure minimal downtime and adherence to service level agreements (SLAs). Recent and frequent engagement during incidents is must.Observability: Build alerting, monitoring and dashboards that identify problems proactively. Recent hands-on experience with threshold based Al...
Ready to Apply?
Take the next step in your AI career. Submit your application to AT&T today.
Submit Application