Job Description
Responsibilities
Design and enhance Kubernetes provider platforms and supporting infrastructure to improve scalability, reliability, and developer experience.Automate and simplify Kubernetes clusters lifecycle management, upgrades, and observability workflows.Implement monitoring and alerting systems using tools such as Prometheus, Grafana, or Elastic Observability to meet service-level objective (SLOs).Collaborate with security teams to integrate and enforce security controls and compliance requirements within the container platform.Work with application teams to improve platform usability, streamline onboarding, and reduce operational toil.Respond to incidents and perform post-incident reviews, driving continuous improvement and operational excellence.Contribute to the reliability engineering culture, fostering shared responsibility for system availability and performance. Requirements (Minimu...
Ready to Apply?
Take the next step in your AI career. Submit your application to Centre for Strategic Infocomm Technologies today.
Submit Application