Job Description
What will you do? Design, develop, and document Infrastructure as Code (Terraform) for ML/LLM platform components on AWS/Databricks; implement secure, scalable foundations for data, compute, networking, and secrets. Build and maintain GitHub based pipelines (Actions/Workflows) for training, packaging, validation, and deployment of ML/LLM assets (models, evaluation suites, prompts, policies), using GitOps for environment promotion. Containerize models using Docker and deploy them primarily through managed endpoints (SageMaker/Azure ML); Kubernetes‑based serving (KServe/Triton/Seldon) is a plus. Operate model registries and feature stores; enforce versioning, lineage, and artifact governance via MLflow/Databricks and cloud native services. Implement logs/metrics/traces, performance profiling, and drift/quality monitors; define SLIs/SLOs and on call runbooks; drive incident response and post-mortems with accountability (business hours support rotation). Embed DevSecOps: secrets management...
Ready to Apply?
Take the next step in your AI career. Submit your application to Schneider Electric today.
Submit Application