Eval this site
Learn about evals by running them yourself - against this very site.
- Eval-Ception - can your agent pass the exam to speak on behalf of ai-evals.io? Hands-on tutorial using Promptfoo.
Have a cookbook to share?
I welcome external contributions:
- They should be runnable in a few commands and simple parameters, not just principles.
- It should be small enough code as to be auditable and take the user to eval results within 10 minutes (sans LLMs running time).
- Security expectations might evolve over time as we find the right balance between easy and secure defaults. I don’t want to create bad habits or unnecessary attack vectors while also recognizing that people will just use much of what they find.
Feedback welcome: Reach out directly (Linktree) or join Eval-Ception Discussions.