Job Description

Responsibilities

Support production stability by monitoring, troubleshooting, and resolving service issues across distributed systems.

Perform initial triage and coordination for production problems, following defined procedures and escalation paths.

Support Kubernetes-based workloads, including pod health, service availability, resource usage, and basic networking issues.

Assist with deployment and change activities, validating configurations, monitoring rollouts, and ensuring minimal disruption.

Support application build and delivery workflows, including basic code compilation, container image builds, and artifact validation.

Assist in troubleshooting data platforms and pipelines, including batch and streaming workloads.

Maintain clear operational communication, documentation, and shift handoffs while collaborating with engineering and platform teams.

Required Skills & Experience

  • Experience in a p...

Ready to Apply?

Take the next step in your AI career. Submit your application to TDCX today.

Submit Application