Job Description

Key Responsibilities:

▪ Lead the monitoring and maintenance of system health using observability platforms such as AppDynamics, Dynatrace, Datadog, or New Relic.

▪ Provide expert consultation, design, and implementation of APM, Real User Monitoring, Synthetic Monitoring, Infrastructure Monitoring, and Log Management modules.

▪ Oversee incident, problem, change, and release management processes as per ITIL standards. ▪ Manage and drive major incident bridge calls and post-incident reviews (PIRs).

▪ Conduct root cause analysis and troubleshooting using tools like New Relic and Kibana.

▪ Develop and maintain monitoring alerts and dashboards.

▪ Resolve production issues across various services and stack levels.

▪ Ensure compliance with Service Level Objectives (SLOs) and Service Level Agreements (SLAs).

▪ Develop monitoring solutions to detect symptoms and prevent outages.

▪ Automate operational processes to enhance syst...

Ready to Apply?

Take the next step in your AI career. Submit your application to Tekion Corp today.

Submit Application