Job Description
Inference Optimization Drive TTFT below 400ms for multi-step agent pipelines Streaming optimization: first token to user while sub-agents are still running KV cache strategy, prompt compression, dynamic context window management Multi-provider routing: model selection by latency, cost, and task type across OpenAI, Anthropic, Gemini, and open-weight models Agent Architecture Design and implement Plan-Execute-Synthesize pipelines that run sub-agents in parallel DAGs, not sequential chains Build reliable orchestration on top of Temporal: retries, timeouts, partial failure recovery, idempotency Structured output enforcement: JSON schema validation, retry loops on malformed LLM output, graceful degradation Tool call design: schema design that LLMs actually follow reliably across providers Evaluation & Harness Own the eval framework end to end: ground truth datasets, automated scoring pipelines, regression detection on every PR LLM-as-judge pipelines for qualitative output assessment Latency...
Ready to Apply?
Take the next step in your AI career. Submit your application to Zyoin Group today.
Submit Application