Job Description

Key Responsibilities:
● Design and implement evaluation harnesses to measure retrieval accuracy, citation correctness,
response quality, and overall system behavior
● Develop automated tests for APIs, ingestion pipelines, and chat workflows
● Collaborate with developers and product managers to define quality metrics (accuracy, latency,
cost, hallucination rate)
● Analyze logs, traces, and feedback signals to identify root causes of failures in AI-driven
responses
● Create regression suites to ensure changes to prompts, chunking, or embeddings don't break
existing behavior
● Validate REST APIs and service integrations for resilience, correctness, and security
● Contribute to observability by instrumenting metrics and dashboards for system performance
● Participate in sprint planning and retrospectives, ensuring testability is built into features from
day one
Key Requirements:
● 3+ years of experience in software testing, quality engineering, ...

Ready to Apply?

Take the next step in your AI career. Submit your application to Confidential today.

Submit Application