Job Description

As a Principal member of the Site Reliability Engineering (SRE) team, you'll take ownership of highly available systems, influence service design, and work across teams to drive resiliency, automation, and operational excellence. This is a hands-on engineering role where deep infrastructure knowledge meets software engineering expertise, ideal for experienced SREs ready to take the lead.


What You’ll Do:

  • Lead the design, automation, and support of OCI services with a focus on resiliency, security, scalability, and performance.
  • Own and improve the end-to-end reliability metrics (SLOs, SLAs, KPIs) for your services.
  • Design and implement high-availability architectures and standards for large-scale distributed systems.
  • Serve as the ultimate escalation point for complex operational issues, using a deep understanding of service topologies and interdependencies.
  • Architect and build automation and orchestration tools that reduce m...
  • Ready to Apply?

    Take the next step in your AI career. Submit your application to Oracle today.

    Submit Application