Job Description

Descripción del trabajo Job Description

We are seeking a hands‑on Site Reliability Engineer (SRE) / AI Platform DevOps Engineer to own infrastructure provisioning, CI/CD automation, telemetry pipelines, and production deployment for AI‑powered services, agents, and orchestration systems.

This is an SRE‑heavy, infrastructure‑first role , focused on ensuring AI systems operating in production are:

  • Reliable
  • Observable
  • Scalable
  • Secure
  • Cost‑efficient
  • Safe to deploy and operate

You will play a critical role in building and maintaining the platform foundation that enables AI services to run safely and efficiently at scale.

Key Responsibilities

1. Infrastructure Provisioning & Automation

  • Design and manage cloud infrastructure using Infrastructure as Code (Terraform or similar)
  • Provision and maintain Kubernetes clu...

Ready to Apply?

Take the next step in your AI career. Submit your application to Endava today.

Submit Application