Job Description

About the Role

is hiring Deep Learning Model Optimization Engineers to build, train, and optimize state-of-the-art deep learning models for high-performance production deployment. This role sits at the intersection of research and systems engineering, with a strong focus on inference efficiency across GPUs and edge devices.

Key Responsibilities

  • Design and implement deep learning models (CNNs, Transformers, hybrid architectures).
  • Build scalable training pipelines and distributed training workflows.
  • Apply model compression techniques: quantization, pruning, and knowledge distillation.
  • Optimize inference using TensorRT, ONNX Runtime, OpenVINO, or TVM.
  • Profile and analyze performance bottlenecks using GPU profiling tools.
  • Develop custom CUDA/C++ kernels when required.
  • Benchmark latency, throughput, and accuracy across hardware platforms.
  • Collaborate on deployment using Triton Inference Server and c...

Ready to Apply?

Take the next step in your AI career. Submit your application to miniByte today.

Submit Application