Job Description

Key Responsibilities

  • Cluster Operations & Management: Manage and maintain container clusters (Kubernetes, Docker) and open-source component clusters (Kafka, Redis, Elasticsearch) across multiple business units

  • Ensure optimal performance, scalability, and reliability of distributed systems

  • Infrastructure Platform Development: Design, build, and enhance infrastructure operation platforms

  • Develop and maintain systems for infrastructure management, CI/CD pipelines, monitoring/alerting, and centralized logging

  • Drive platform standardization and automation initiatives

  • High Availability & Reliability: Ensure maximum uptime for production services through proactive monitoring and incident response

  • Continuously optimize service architecture, deployment strategies, and operational processes

  • Implement and maintain SLA/SLO frameworks and reliability...

Ready to Apply?

Take the next step in your AI career. Submit your application to Manus AI today.

Submit Application