Job Description
Job Description:
· Ensure critical systems meet uptime and performance SLAs (Service Level Agreements) and SLOs (Service Level Objectives)· Participate in on-call rotations, lead post-mortems, and drive root cause analysis
· Implement redundancy, failover, and high availability strategies to keep services running smoothly.
· Build and maintain robust monitoring, alerting, and observability systems (e.g., Prometheus, Grafana, Datadog)
· Ensure the security of infrastructure and pipelines by implementing best practices for access control, encryption, and vulnerability management.
· Collaborate with DevOps/Dev teams to build, maintain, and improve CI/CD pipelines
· Have fun with a great team while tackling hard challenges.
Additional Positions:
Category:
Job Qualifications:
Key Responsibilities:5 years of experience designing, deploying, maintaining, and troubleshooting l...
Ready to Apply?
Take the next step in your AI career. Submit your application to Jobinfo today.
Submit Application