Monkeys
Tools

Built-in Tools (Model Evaluation)

Model and workflow evaluation capabilities

Model Evaluation

Evaluation capabilities help compare model or workflow behavior. In the current ecosystem, evaluation surfaces live in Studio and corresponding APIs live in monkeys-server.

Typical Uses

  • Compare prompts, models, or workflow versions.
  • Inspect generated outputs and execution logs.
  • Track quality before publishing workflow changes.
  • Support domain-specific evaluation pages where enabled.

Evaluation data should be treated as part of the application lifecycle, not as an afterthought after deployment.

On this page