Job Description

About the Job
The Site Reliability Engineering (SRE) team is responsible for ensuring the reliability, scalability, and performance of large-scale telecom and CPaa S platforms. This role combines software engineering and systems operations to build resilient, observable, and automated infrastructure that supports high-throughput messaging services. The team operates in a 24/7 environment and works closely with Engineering, CX and Products to maintain carrier-grade service reliability.
What you’ll be responsible for
- Ensure high availability, performance, and reliability of CPaa S production systems speread across mutiple locations hosted over cloud and data centers
- Own and improve SLIs, SLOs, and SLAs for messaging platforms and supporting services.
- Monitor system health, latency, TPS, error rates, and delivery metrics using observability tools.
- Participate in on-call rotations and handle production incidents with a focus on fast recovery and root cause analysi...

Ready to Apply?

Take the next step in your AI career. Submit your application to ValueFirst today.

Submit Application