Job Description

Position Overview

Observability-Focused Mindset

  • Deep understanding of telemetry: Metrics, logs, traces, and events.

  • Experience with observability tools: e.g., Datadog, AWS CloudWatch, SolarWinds, OpenTelemetry.

  • Ability to define and refine SLIs/SLOs to measure system health.

  • Proactive monitoring: Builds dashboards and alerts that detect issues before users do.

  • Incident Management & Communication

  • Calm under pressure: Handles high-severity incidents with clarity and focus.

  • Strong communicator: Clearly articulates impact, status, and resolution steps to stakeholders.

  • Postmortem discipline: Writes blameless post-incident reports and drives follow-ups.

  • Collaboration: Works closely with devs, product, and support during incidents.

  • Logging & Iterative Improvements

  • Strategic logging: Adds meaningful logs that aid in debuggi...

  • Ready to Apply?

    Take the next step in your AI career. Submit your application to Zelis today.

    Submit Application