Job Description

What you ’ll do

- Design and implement automated operational workflows to improve system reliability and reduce manual intervention

- Build and maintain observability solutions using tools such as Datadog, to deliver metrics, monitoring, alerting, and dashboards

- Partner with development teams to improve application reliability, deployment safety, and performance through SRE best practices

- Develop and maintain CI/CD pipelines and deployment automation using Bitbucket/Jenkins, GitHub Actions, and related tooling

- Engineer scalable solutions for production environments across Linux and Windows systems

- Automate infrastructure and operational tasks using Python, PowerShell, Bash, or similar scripting languages

- Support and enhance reliability of database platforms such as SQL Server and MongoDB from an SRE perspective

- Participate in incident response, drive root cause analysis, and implement long‑term reliability improvements

- Def...

Ready to Apply?

Take the next step in your AI career. Submit your application to Point72 today.

Submit Application