Job Description

About the Job

The Site Reliability Engineering (SRE) team is responsible for ensuring the reliability, scalability, and performance of large-scale telecom and CPaaS platforms. This role combines software engineering and systems operations to build resilient, observable, and automated infrastructure that supports high-throughput messaging services. The team operates in a 24/7 environment and works closely with Engineering, CX and Products to maintain carrier-grade service reliability.


What you’ll be responsible for

  • Ensure high availability, performance, and reliability of CPaaS production systems speread across mutiple locations hosted over cloud and data centers
  • Own and improve SLIs, SLOs, and SLAs for messaging platforms and supporting services.
  • Monitor system health, latency, TPS, error rates, and delivery metrics using observability tools.
  • Participate in on-call rotations and handle p...

Ready to Apply?

Take the next step in your AI career. Submit your application to ValueFirst today.

Submit Application