Job Description
Key Responsibilities
Cluster Operations & Management: Manage and maintain container clusters (Kubernetes, Docker) and open-source component clusters (Kafka, Redis, Elasticsearch) across multiple business units
Ensure optimal performance, scalability, and reliability of distributed systems
Infrastructure Platform Development: Design, build, and enhance infrastructure operation platforms
Develop and maintain systems for infrastructure management, CI/CD pipelines, monitoring/alerting, and centralized logging
Drive platform standardization and automation initiatives
High Availability & Reliability: Ensure maximum uptime for production services through proactive monitoring and incident response
Continuously optimize service architecture, deployment strategies, and operational processes
Implement and maintain SLA/SLO frameworks and reliability...
Ready to Apply?
Take the next step in your AI career. Submit your application to Manus AI today.
Submit Application