Job Description

Skills Required:

SQL, NOSQL, Nagios, Cloudwatch, Zabbix, Datadog, New Relic, Prometheus, Grafana,

App Dynamics, Site24x7, Telemetry, Splunk, CI CD, CI/CD, CICD, DevOps, Kentico,

SRE, Site Reliability, AIOps, Agentic, GEN AI, AI, ML

Experience Range:

10 - 16 years

Key Responsibilities:

• Design, develop and maintain observability, monitoring, and alerting systems for AI

platforms and mission-critical backend services.

• Design telemetry pipelines, logging infrastructure, and metrics dashboards using tools

such as Splunk, Prometheus, Grafana, and OpenTelemetry.

• Define and maintain SLOs, SLIs, and real-time health indicators across platform

services and APIs.

• Participate in on-call rotations and lead the resolution of high-impact incidents,

including root cause analysis and postmortem reporting.

• Collaborate with platform engineering teams to enforce governance, compliance, and<...

Ready to Apply?

Take the next step in your AI career. Submit your application to Brace Infotech Private Ltd today.

Submit Application