Job Description

Description

  • Design, deploy, and manage highly available and scalable infrastructure on AWS Cloud.
  • Implement and maintain CI/CD pipelines using GitHub Actions.
  • Manage and optimize Kubernetes clusters (EKS) for containerized workloads.
  • Implement monitoring, logging, and observability solutions using Prometheus, Grafana, Loki, Promtail, Coralogix
  • Ensure high availability, reliability, and performance of production systems.
  • Plan, implement, and execute Disaster Recovery (DR) strategies, including DR drills and failover testing.
  • Automate infrastructure provisioning, deployment, and configuration management.
  • Troubleshoot production issues, perform root cause analysis, and provide permanent fixes.
  • Collaborate with development, QA, and security teams to streamline DevOps workflows.
  • Maintain documentation for infrastructure, deployment, and DR processes.
  • Ensure best pract...

Ready to Apply?

Take the next step in your AI career. Submit your application to OpsTree Global today.

Submit Application