Job Description
Job Description
Leadership & StrategyDefine and implement SRE best practices across the organization.Proven expertise in production support, resilience engineering, disaster recovery (DCR), automation, and cloud operationsMentor and guide a team of SREs, fostering growth and technical excellence.Collaborate with senior stakeholders to align reliability goals with business objectives.Reliability & PerformanceEstablish SLIs, SLOs, and SLAs for critical services and ensure adherence.Drive initiatives to improve system resilience and reduce operational toil.Excellent in designing systems that detect and remediate issues without manual intervention – Self Healing systems, Runbook automationExposure to tools like Gremlin, Chaos Monkey, AWS FIS to simulate outages and improve fault toleranceIncident ManagementAct as the primary point of escalation for critical production issues and lead major incident response, root cause analysis...
Ready to Apply?
Take the next step in your AI career. Submit your application to Experian today.
Submit Application