Job Description
Principal Site Reliability Engineer
We are seeking an experienced Principal Site Reliability Engineer to join a dynamic Platform Tribe. This role focuses on a high-end, microservice-based platform designed to process billions of financial transactions per day. You will be part of a team chasing zero-latency and ensuring a smooth connection for global users regardless of bandwidth.
What you will be doing:
- Manage day-to-day alerts, system checks, and issue escalation.
- Provide 24x7 on-call support for critical SaaS events.
- Proactively create monitors within the EKS/K8s ecosystem.
- Deploy to clusters using Terraform and Helm/Flux.
- Enhance infrastructure health by implementing checks and scripts for known issues.
- Maintain and develop deployment code and integrate new Cloud Infrastructure technologies.
- Conduct RCA (Root Cause Analysis) and take corrective actions to prev...
Ready to Apply?
Take the next step in your AI career. Submit your application to Explore Group today.
Submit Application