Job Description
Description
We are seeking a skilled Site Reliability Engineer to join our team and help build, maintain, and scale our cloud-native infrastructure. You will work closely with development and operations teams to ensure our systems are reliable, scalable, and efficient. The ideal candidate is passionate about automation, observability, and infrastructure-as-code, and thrives in a collaborative, fast-paced environment.
Key Responsibilities
Design, implement, and manage cloud infrastructure on Azure using Terraform and Terragrunt.
Maintain and optimize Kubernetes clusters on Azure Kubernetes Service (AKS).
Build and manage CI/CD pipelines using GitHub Actions/Workflows and ArgoCD for GitOps deployments.
Enhance system reliability by implementing monitoring, alerting, and observability solutions with Grafana.
Automate operational tasks to reduce toil and improve team efficiency.
Participate in on-call rotations, incident response, a...
Ready to Apply?
Take the next step in your AI career. Submit your application to TEKsystems today.
Submit Application