Job Description

Key Responsibilities:
▪ Lead the monitoring and maintenance of system health using observability platforms such as App Dynamics, Dynatrace, Datadog, or New Relic.
▪ Provide expert consultation, design, and implementation of APM, Real User Monitoring, Synthetic Monitoring, Infrastructure Monitoring, and Log Management modules.
▪ Oversee incident, problem, change, and release management processes as per ITIL standards. ▪ Manage and drive major incident bridge calls and post-incident reviews (PIRs).
▪ Conduct root cause analysis and troubleshooting using tools like New Relic and Kibana.
▪ Develop and maintain monitoring alerts and dashboards.
▪ Resolve production issues across various services and stack levels.
▪ Ensure compliance with Service Level Objectives (SLOs) and Service Level Agreements (SLAs).
▪ Develop monitoring solutions to detect symptoms and prevent outages.
▪ Automate operational processes to enhance system efficiency and reduce manual tasks.
...

Ready to Apply?

Take the next step in your AI career. Submit your application to Tekion Corp today.

Submit Application