Job Description
DevOps & ML Ops Engineer would be responsible for developing and maintaining scalable, stable services that deliver machine learning models to end users with guaranteed uptime. The primary focus will be on the infrastructure, deployment, and continuous integration/continuous delivery (CI/CD) processes for our ML services.
RESPONSIBILITIES:
Manage resource allocation and workload scheduling for multiple ML services, ensuring efficient utilization of CPU/GPU resources and creating reliable queues based on service priorities.
Maintain VM environments and manage OS updates, keep up-to-date VM inventory
Work alongside the Dev and QA team to detect hot spots in our applications and set preventative measure before it becomes a live issue.
Troubleshooting and provide solutions for system configurations
Plan, execute and test disaster recovery
Monitor and examine all application, performance, event, and system logs to assist in troubleshooting
Responsible for filing all ...
RESPONSIBILITIES:
Manage resource allocation and workload scheduling for multiple ML services, ensuring efficient utilization of CPU/GPU resources and creating reliable queues based on service priorities.
Maintain VM environments and manage OS updates, keep up-to-date VM inventory
Work alongside the Dev and QA team to detect hot spots in our applications and set preventative measure before it becomes a live issue.
Troubleshooting and provide solutions for system configurations
Plan, execute and test disaster recovery
Monitor and examine all application, performance, event, and system logs to assist in troubleshooting
Responsible for filing all ...
Ready to Apply?
Take the next step in your AI career. Submit your application to TransPerfect today.
Submit Application