Mindrift connects experienced engineers with project-based AI work for leading tech companies. We’re looking for Freelance Agent Evaluation Engineers to help test, evaluate, and improve AI agents through structured, real-world scenarios.
What you’ll do
- Design test cases and gold-standard evaluation criteria
- Analyze agent behavior, logs, and failure modes
- Build and iterate on prompts, scenarios, and evaluation logic
-
Work with code repos, test frameworks, and structured formats
What we’re looking for
- 3+ years of software development experience (strong Python)
- Experience with Git, JSON/YAML, and Docker
- Understanding of LLM limitations and evaluation design
-
English proficiency (B2+)
How it works
- Freelance, project-based work (≈6–10 hrs/week during active phases)
- Choose projects and work on your own schedule
-
Paid per project/task, with rates up to $80/hour depending on scope and expertise