Job Description
The Role:
We are seeking an exceptional AI Evaluation Engineer to design, implement, and scale frameworks for assessing the performance, reliability, and trustworthiness of advanced AI systems. This individual will be responsible for developing methodologies and tools to measure model quality across diverse dimensions, such as accuracy, robustness, reasoning, safety, and efficiency.
Key Responsibilities:
- Design and Develop Evaluation Frameworks: Create scalable, reproducible evaluation pipelines for large-scale AI systems, including LLMs and multi-agent architectures, covering both automated and human-in-the-loop testing strategies.
- Metric Innovation: Define and implement novel evaluation metrics that capture model capabilities beyond traditional benchmarks.
- Benchmarking & Performance Analysis: Conduct benchmarking of AI models across domains, tasks modalities, anal...
Ready to Apply?
Take the next step in your AI career. Submit your application to Openchip And Software Technologies SL today.
Submit Application