Solutions Architect, Inference Deployments

NVIDIA

📍 Santa Clara, CA, United States

Full-time other-general Posted March 03, 2026

Apply Now Similar Jobs

Job Description

                    We’re forming a team of innovators to roll out and enhance AI inference solutions at scale, demonstrating NVIDIA’s GPU technology and Kubernetes. As a Solutions Architect (Inference Focus), you’ll collaborate closely with our engineering, DevOps, and customer success teams to foster enterprise AI adoption. Together, we'll introduce generative AI to production!
  
  
What you'll be doing:
+ Help customers craft, deploy, and maintain scalable, GPU-accelerated inference pipelines on Kubernetes for large language models (LLMs) and generative AI workloads.
+ Enhance performance tuning using TensorRT/TensorRT-LLM, NVIDIA NIM, and Triton Inference Server to improve GPU utilization and model efficiency.
+ Collaborate with multi-functional teams (engineering, product) and offer technical mentorship to customers implementing AI at scale.
+ Architect zero-downtime deployments, autoscaling (e.g., HPA or equivalent experience with custom metrics), and integration with cloud-nat...
                

Ready to Apply?

Take the next step in your AI career. Submit your application to NVIDIA today.

Submit Application

Job Details

Location

Santa Clara, CA, United States

Job Type

Full-time