Job Description
Skills Required:
SQL, NOSQL, Nagios, Cloudwatch, Zabbix, Datadog, New Relic, Prometheus, Grafana,
App Dynamics, Site24x7, Telemetry, Splunk, CI CD, CI/CD, CICD, DevOps, Kentico,
SRE, Site Reliability, AIOps, Agentic, GEN AI, AI, ML
Experience Range:
10 - 16 years
Key Responsibilities:
• Design, develop and maintain observability, monitoring, and alerting systems for AI
platforms and mission-critical backend services.
• Design telemetry pipelines, logging infrastructure, and metrics dashboards using tools
such as Splunk, Prometheus, Grafana, and OpenTelemetry.
• Define and maintain SLOs, SLIs, and real-time health indicators across platform
services and APIs.
• Participate in on-call rotations and lead the resolution of high-impact incidents,
including root cause analysis and postmortem reporting.
• Collaborate with platform engineering teams to enforce governance, compliance, and<...
Ready to Apply?
Take the next step in your AI career. Submit your application to Brace Infotech Private Ltd today.
Submit Application