Job Description
Site Reliability Engineer (SRE) Responsibilities
- Design, implement, and maintain scalable and highly available infrastructures.
- Monitor and ensure the performance and reliability of production systems.
- Implement automation for recurring tasks and operational processes.
- Collaborate with development teams to improve continuous delivery and codedeployment.
- Respond to incidents and conduct post-mortem analysis to prevent future issues.
- Optimize resource usage and manage system capacity.
- Experience in a similar role.
- Knowledge of Unix/Linux operating systems.
- Experience with monitoring and log management tools (Prometheus, Grafana, Splunk, ELK stack).
- Scripting and automation skills (Python, Bash, Go, Shell).
- Experience with cloud platforms (AWS, GCP, Azure).
- Knowledge of containers and orchestration (Docker, Kubernetes).
Ready to Apply?
Take the next step in your AI career. Submit your application to Virtualent today.
Submit Application