Job Description
- Design and implement automated operational workflows to improve system reliability and reduce manual intervention
- Build and maintain observability solutions using tools such as Datadog, to deliver metrics, monitoring, alerting, and dashboards
- Partner with development teams to improve application reliability, deployment safety, and performance through SRE best practices
- Develop and maintain CI/CD pipelines and deployment automation using Bitbucket/Jenkins, GitHub Actions, and related tooling
- Engineer scalable solutions for production environments across Linux and Windows systems
- Automate infrastructure and operational tasks using Python, PowerShell, Bash, or similar scripting languages
- Support and enhance reliability of database platforms such as SQL Server and MongoDB from an SRE perspective
- Participate in incident response, drive root cause analysis, and implement long‑term reliability improvements
...Ready to Apply?
Take the next step in your AI career. Submit your application to Point72 today.
Submit Application