Job Description
Description
- Design, deploy, and manage highly available and scalable infrastructure on AWS Cloud.
- Implement and maintain CI/CD pipelines using GitHub Actions.
- Manage and optimize Kubernetes clusters (EKS) for containerized workloads.
- Implement monitoring, logging, and observability solutions using Prometheus, Grafana, Loki, Promtail, Coralogix
- Ensure high availability, reliability, and performance of production systems.
- Plan, implement, and execute Disaster Recovery (DR) strategies, including DR drills and failover testing.
- Automate infrastructure provisioning, deployment, and configuration management.
- Troubleshoot production issues, perform root cause analysis, and provide permanent fixes.
- Collaborate with development, QA, and security teams to streamline DevOps workflows.
- Maintain documentation for infrastructure, deployment, and DR processes.
- Ensure best pract...
Ready to Apply?
Take the next step in your AI career. Submit your application to OpsTree Global today.
Submit Application