Monkeys
Cloud Native Operations

Multiple Replica Load Balancing

Load balancing multiple Monkeys service replicas

Multiple Replica Load Balancing

Most Monkeys HTTP services can be scaled behind Kubernetes Services or an ingress/load balancer. State should live in PostgreSQL, Redis, object storage, or the owning external provider instead of local process memory.

Notes by Service Type

  • Frontends: scale as static web services.
  • Main backend: ensure database migrations are controlled and sessions/config are shared appropriately.
  • Workers: scale based on queue and task behavior.
  • Tool services: make external provider calls idempotent where retries may happen.
  • Agent and MCP services: review thread/session behavior before scaling.

Always verify websocket, SSE, and streaming endpoints when changing load-balancing behavior.

On this page