Job Description

Description

Ensure System Reliability and Availability

  • Oversee application performance, report any deviation and issue
  • Collaborate with application engineers and developers in root cause identification
  • Incident Management and Root Cause Analysis

  • Participate in incident response efforts for production outages as Subject Matter Advisor
  • Provide insights from monitoring and in-depth code/database review
  • Assist Application Operation post-mortems review
  • Automation and Tooling

  • Automate operational tasks such as monitoring, and recovery.
  • Develop scripts and tools to reduce manual toil and improve efficiency.
  • Monitoring and Observability

  • Implement robust telemetry systems to monitor application health, latency, and error rates.
  • Manage Dynatrace platform and integration with all application services
  • Assist Application team in dashboarding design and setup
  • Security and C...

    Ready to Apply?

    Take the next step in your AI career. Submit your application to AIA today.

    Submit Application