Job Description

AI Benchmark Engineer

Turing is one of the world’s fastest-growing AI companies that accelerates the advancement and deployment of powerful AI systems. As an AI Benchmark Engineer, you will design and build high‑quality multi‑agent benchmark tasks based on real‑world software engineering workflows. These tasks are built from real open‑source code changes such as bug fixes, migrations, and refactors, and are used to evaluate how effectively AI agents can understand large codebases, apply precise modifications, and produce correct, testable outputs.

Responsibilities

  • Build multi‑agent benchmark tasks based on real‑world open‑source code changes.
  • Use the Harbor evaluation framework to run and validate tasks within Docker environments.
  • Write clear, precise task instructions specifying file paths, function signatures, expected behavior, and constraints.
  • Design and implement Python‑based verification scripts to validate cor...

Ready to Apply?

Take the next step in your AI career. Submit your application to Turing today.

Submit Application