Job Description

Key Responsibilities

- End-to-end service ownership: design for telemetry, security, resiliency, scalability, and performance; lead sizing/architecture; drive service health reviews and process simplification.

- Incident management and prevention: lead postmortems/RCAs, coordinate fixes, define repair items, and implement data-driven prevention and continuous improvement.

- AI/ML and GenAI delivery: design and integrate solutions with LLMs, RAG, agentic workflows, and conversational AI; build low-latency model serving and retraining pipelines.

- Application engineering: develop performant microservices for distributed, containerized, cloud-native systems.

- Automation: eliminate toil by automating operational workflows, recovery procedures, code delivery, and configuration management; build internal tools and reusable scripts/services to accelerate delivery and reduce errors.

- Observability: define and implement monitoring, logging, alertin...

Ready to Apply?

Take the next step in your AI career. Submit your application to Oracle today.

Submit Application