Job Description
About the Job
The Site Reliability Engineering (SRE) team is responsible for ensuring the reliability, scalability, and performance of large-scale telecom and CPaaS platforms. This role combines software engineering and systems operations to build resilient, observable, and automated infrastructure that supports high-throughput messaging services. The team operates in a 24/7 environment and works closely with Engineering, CX and Products to maintain carrier-grade service reliability.
What you’ll be responsible for
- Ensure high availability, performance, and reliability of CPaaS production systems speread across mutiple locations hosted over cloud and data centers
- Own and improve SLIs, SLOs, and SLAs for messaging platforms and supporting services.
- Monitor system health, latency, TPS, error rates, and delivery metrics using observability tools.
- Parti...
Ready to Apply?
Take the next step in your AI career. Submit your application to ValueFirst today.
Submit Application