Job Description

Job Description

We are seeking a hands-on Site Reliability Engineer (SRE) / AI Platform DevOps Engineer to own infrastructure provisioning, CI/CD automation, telemetry pipelines, and production deployment for AI-powered services, agents, and orchestration systems.

This is an SRE-heavy, infrastructure-first role , focused on ensuring AI systems operating in production are:

  • Reliable

  • Observable

  • Scalable

  • Secure

  • Cost-efficient

  • Safe to deploy and operate

You will play a critical role in building and maintaining the platform foundation that enables AI services to run safely and efficiently at scale.

Key Responsibilities

1. Infrastructure Provisioning & Automation

  • Design and manage cloud infrastructure using Infrastructure as Code (Terraform or similar)

  • Provision and maintain Ku...

Ready to Apply?

Take the next step in your AI career. Submit your application to Endava today.

Submit Application