Job Description
Description
Ensure System Reliability and Availability
Oversee application performance, report any deviation and issueCollaborate with application engineers and developers in root cause identificationIncident Management and Root Cause Analysis
Participate in incident response efforts for production outages as Subject Matter AdvisorProvide insights from monitoring and in-depth code/database reviewAssist Application Operation post-mortems reviewAutomation and Tooling
Automate operational tasks such as monitoring, and recovery.Develop scripts and tools to reduce manual toil and improve efficiency.Monitoring and Observability
Implement robust telemetry systems to monitor application health, latency, and error rates.Manage Dynatrace platform and integration with all application servicesAssist Application team in dashboarding design and setupSecurity and C...
Ready to Apply?
Take the next step in your AI career. Submit your application to AIA today.
Submit Application