Job Description
- Design and implement automated operational workflows to improve system reliability and reduce manual intervention
- Build and maintain observability solutions using tools such as Datadog, to deliver metrics, monitoring, alerting, and dashboards
- Partner with development teams to improve application reliability, deployment safety, and performance through SRE best practices
- Develop and maintain CI/CD pipelines and deployment automation using Bitbucket/Jenkins, GitHub Actions, and related tooling
- Engineer scalable solutions for production environments across Linux and Windows systems
- Automate infrastructure and operational tasks using Python, PowerShell, Bash, or similar scripting languages
- Support and enhance reliability of database platforms such as SQL Server and MongoDB from an SRE perspective
- Participate in incident response, drive root cause analysis, and implement long‑term reliability improvements
- Def...
Ready to Apply?
Take the next step in your AI career. Submit your application to Point72 today.
Submit Application