Job Description
We are hiring a Site Reliability Engineer (SRE) to manage, support, and enhance enterprise data platforms. This role focuses on platform reliability, automation, and integration, ensuring scalability, stability, and compliance in a dynamic and fast-paced environment.
The Position:
- Design and implement automation frameworks to streamline operational tasks for data platforms (e.g., provisioning, configuration, monitoring, and incident remediation).
- Collaborate with Data Platform Engineers, data product teams, and business stakeholders to ensure reliability and performance of data platforms.
- Develop and maintain Infrastructure-as-Code (IaC) solutions for deploying and managing data platform components across environments.
- Establish robust monitoring, alerting, and observability systems to proactively detect and resolve issues.
- Drive incident management processes, including root cause analysis and post-mortem reviews, to...
Ready to Apply?
Take the next step in your AI career. Submit your application to Tek Systems today.
Submit Application