Job Description

Looking for Site Reliable Engineer with below skills

Skills Required:

SQL, NOSQL, Nagios, Cloudwatch, Zabbix, Datadog, New Relic, Prometheus, Grafana,

App Dynamics, Site24x7, Telemetry, Splunk, CI CD, CI/CD, CICD, DevOps, Kentico,

SRE, Site Reliability, AIOps, Agentic, GEN AI, AI, ML

Experience Range:

10 - 16 years

Key Responsibilities:

• Design, develop and maintain observability, monitoring, and alerting systems for AI

platforms and mission-critical backend services.

• Design telemetry pipelines, logging infrastructure, and metrics dashboards using tools

such as Splunk, Prometheus, Grafana, and OpenTelemetry.

• Define and maintain SLOs, SLIs, and real-time health indicators across platform

services and APIs.

• Participate in on-call rotations and lead the resolution of high-impact incidents,

including root cause analysis and postmortem reporting.

Ready to Apply?

Take the next step in your AI career. Submit your application to Brace Infotech Private Ltd today.

Submit Application