Job Description
What will you do?
* Design, develop, and document Infrastructure as Code (Terraform) for ML/LLM platform components on AWS/Databricks; implement secure, scalable foundations for data, compute, networking, and secrets.
* Build and maintain GitHub based pipelines (Actions/Workflows) for training, packaging, validation, and deployment of ML/LLM assets (models, evaluation suites, prompts, policies), using GitOps for environment promotion.
* Containerize models using Docker and deploy them primarily through managed endpoints (SageMaker/Azure ML); Kubernetes-based serving (KServe/Triton/Seldon) is a plus.
* Operate model registries and feature stores; enforce versioning, lineage, and artifact governance via MLflow/Databricks and cloud native services.
* Implement logs/metrics/traces, performance profiling, and drift/quality monitors; define SLIs/SLOs and on call runbooks; drive incident response and post-mortems with a...
Ready to Apply?
Take the next step in your AI career. Submit your application to Schneider Electric today.
Submit Application