Job Description

Looking for Site Reliable Engineer with below skills
Skills Required:
SQL, NOSQL, Nagios, Cloudwatch, Zabbix, Datadog, New Relic, Prometheus, Grafana,
App Dynamics, Site24x7, Telemetry, Splunk, CI CD, CI/CD, CICD, Dev Ops, Kentico,
SRE, Site Reliability, AIOps, Agentic, GEN AI, AI, ML
Experience Range:
10 - 16 years
Key Responsibilities:
• Design, develop and maintain observability, monitoring, and alerting systems for AI
platforms and mission-critical backend services.
• Design telemetry pipelines, logging infrastructure, and metrics dashboards using tools
such as Splunk, Prometheus, Grafana, and Open Telemetry.
• Define and maintain SLOs, SLIs, and real-time health indicators across platform
services and APIs.
• Participate in on-call rotations and lead the resolution of high-impact incidents,
including root cause analysis and postmortem reporting.
• Collaborate with platform engineering teams to enforce governance, compliance, and
s...

Ready to Apply?

Take the next step in your AI career. Submit your application to Brace Infotech Private Ltd today.

Submit Application