Job Description

Key Responsibilities:
▪ Lead the monitoring and maintenance of system health using observability platforms such as AppDynamics, Dynatrace, Datadog, or New Relic.
▪ Provide expert consultation, design, and implementation of APM, Real User Monitoring, Synthetic Monitoring, Infrastructure Monitoring, and Log Management modules.
▪ Oversee incident, problem, change, and release management processes as per ITIL standards. ▪ Manage and drive major incident bridge calls and post-incident reviews (PIRs).
▪ Conduct root cause analysis and troubleshooting using tools like New Relic and Kibana.
▪ Develop and maintain monitoring alerts and dashboards.
▪ Resolve production issues across various services and stack levels.
▪ Ensure compliance with Service Level Objectives (SLOs) and Service Level Agreements (SLAs).
▪ Develop monitoring solutions to detect symptoms and prevent outages.
▪ Automate operational processes to enhance system efficiency and reduce manual tasks.
▪...

Ready to Apply?

Take the next step in your AI career. Submit your application to Tekion Corp today.

Submit Application