Job Description
SRE
Key Responsibilities
Deployment & AutomationUnderstand, deploy and maintain Helm charts, and CI/CD workflows for AKS, EKS, and on-prem Kubernetes (K3s or RKE2) in customer environments.Standardize customer deployments (private cloud / air-gapped) using reproducible manifests and configuration validation tooling.Maintain our single-node and multi-node install processes; improve installer packaging.Environment ReliabilityMonitor uptime, capacity, and performance across distributed clusters (migration, scan, OLAP DB node groups).Implement proactive alerting (Prometheus, Grafana, Azure Monitor, CloudWatch) and ensure runbooks exist for all major services.Coordinate with customer IT/security teams to handle firewall, proxy, and credential configurations safely and consistently.Release & Incident ManagementParticipate in release-readiness and hardening cycles; validate new images and helm charts before customer ...
Ready to Apply?
Take the next step in your AI career. Submit your application to FlairsTech today.
Submit Application