Job Description
About the Role
is hiring Deep Learning Model Optimization Engineers to build, train, and optimize state-of-the-art deep learning models for high-performance production deployment. This role sits at the intersection of research and systems engineering, with a strong focus on inference efficiency across GPUs and edge devices.
Key Responsibilities
- Design and implement deep learning models (CNNs, Transformers, hybrid architectures).
- Build scalable training pipelines and distributed training workflows.
- Apply model compression techniques: quantization, pruning, and knowledge distillation.
- Optimize inference using TensorRT, ONNX Runtime, OpenVINO, or TVM.
- Profile and analyze performance bottlenecks using GPU profiling tools.
- Develop custom CUDA/C++ kernels when required.
- Benchmark latency, throughput, and accuracy across hardware platforms.
- Collaborate on deployment using Triton Inference Server and c...
Ready to Apply?
Take the next step in your AI career. Submit your application to miniByte today.
Submit Application