Job Description
About the Job
The Site Reliability Engineering (SRE) team is responsible for ensuring the reliability, scalability, and performance of large-scale telecom and CPaaS platforms. This role combines software engineering and systems operations to build resilient, observable, and automated infrastructure that supports high-throughput messaging services. The team operates in a 24/7 environment and works closely with Engineering, CX and Products to maintain carrier-grade service reliability.
What you’ll be responsible for
Ensure
high availability, performance, and reliability
of CPaaS production systems speread across mutiple locations hosted over cloud and data centers
Own and improve
SLIs, SLOs, and SLAs
for messaging platforms and supporting services.
Monitor system health, latency, TPS, error rates, and delivery metrics using observability tools.
Participate in on-call rotations and handle production incidents with a focus on fast...
The Site Reliability Engineering (SRE) team is responsible for ensuring the reliability, scalability, and performance of large-scale telecom and CPaaS platforms. This role combines software engineering and systems operations to build resilient, observable, and automated infrastructure that supports high-throughput messaging services. The team operates in a 24/7 environment and works closely with Engineering, CX and Products to maintain carrier-grade service reliability.
What you’ll be responsible for
Ensure
high availability, performance, and reliability
of CPaaS production systems speread across mutiple locations hosted over cloud and data centers
Own and improve
SLIs, SLOs, and SLAs
for messaging platforms and supporting services.
Monitor system health, latency, TPS, error rates, and delivery metrics using observability tools.
Participate in on-call rotations and handle production incidents with a focus on fast...
Ready to Apply?
Take the next step in your AI career. Submit your application to ValueFirst today.
Submit Application