Job Description
Skills Required:
SQL, NOSQL, Nagios, Cloudwatch, Zabbix, Datadog, New Relic, Prometheus, Grafana,
App Dynamics, Site24x7, Telemetry, Splunk, CI CD, CI/CD, CICD, DevOps, Kentico,
SRE, Site Reliability, AIOps, Agentic, GEN AI, AI, ML
Experience Range:
10 - 16 years
Key Responsibilities:
- Design, develop and maintain observability, monitoring, and alerting systems for AI
platforms and mission-critical backend services.
- Design telemetry pipelines, logging infrastructure, and metrics dashboards using tools
such as Splunk, Prometheus, Grafana, and OpenTelemetry.
- Define and maintain SLOs, SLIs, and real-time health indicators across platform
services and APIs.
- Participate in on-call rotations and lead the resolution of high-impact incidents,
including root cause analysis and postmortem reporting.
- Collaborate with platform engineering teams to...
Ready to Apply?
Take the next step in your AI career. Submit your application to Brace Infotech Private Ltd today.
Submit Application