Job Description

Make a significant impact at Confluent as an Expert Site Reliability Engineer focused on incident management and reliability enhancements. You'll work within a multi-cloud architecture to optimize performance and reliability.
This expert role blends 75% technical engineering with 25% strategy, involving the analysis of systemic failure patterns, designing reliability frameworks, and teaching best practices. You'll be instrumental in developing incident response processes that facilitate organizational success and sustainability. Join a global team dedicated to improving cloud-based reliability.
Key Responsibilities:
• Analyze and improve systemic failure patterns
• Own configuration and workflows for incident management tools
• Define SLO/SLA frameworks to guide reliability investments
• Edit incident documents for customer clarity
• Lead training programs and coach teams through post-mortems
Requirements:
• 10+ years of experience in SRE or incident manageme...

Ready to Apply?

Take the next step in your AI career. Submit your application to IBM today.

Submit Application