Built-in Tools (Model Evaluation)

Model Evaluation

Evaluation capabilities help compare model or workflow behavior. In the current ecosystem, evaluation surfaces live in Studio and corresponding APIs live in monkeys-server.

Typical Uses

Compare prompts, models, or workflow versions.
Inspect generated outputs and execution logs.
Track quality before publishing workflow changes.
Support domain-specific evaluation pages where enabled.

Evaluation data should be treated as part of the application lifecycle, not as an afterthought after deployment.

Built-in Tools (Model Evaluation)

Model Evaluation

Typical Uses

On this page