Job Description

Responsibilities

  • Design and enhance Kubernetes provider platforms and supporting infrastructure to improve scalability, reliability, and developer experience.
  • Automate and simplify Kubernetes clusters lifecycle management, upgrades, and observability workflows.
  • Implement monitoring and alerting systems using tools such as Prometheus, Grafana, or Elastic Observability to meet service-level objective (SLOs).
  • Collaborate with security teams to integrate and enforce security controls and compliance requirements within the container platform.
  • Work with application teams to improve platform usability, streamline onboarding, and reduce operational toil.
  • Respond to incidents and perform post-incident reviews, driving continuous improvement and operational excellence.
  • Contribute to the reliability engineering culture, fostering shared responsibility for system availability and performance.
  • Requirements (Minimu...

    Ready to Apply?

    Take the next step in your AI career. Submit your application to Centre for Strategic Infocomm Technologies today.

    Submit Application