Cloud Native Operations
Multiple Replica Load Balancing
Load balancing multiple Monkeys service replicas
Multiple Replica Load Balancing
Most Monkeys HTTP services can be scaled behind Kubernetes Services or an ingress/load balancer. State should live in PostgreSQL, Redis, object storage, or the owning external provider instead of local process memory.
Notes by Service Type
- Frontends: scale as static web services.
- Main backend: ensure database migrations are controlled and sessions/config are shared appropriately.
- Workers: scale based on queue and task behavior.
- Tool services: make external provider calls idempotent where retries may happen.
- Agent and MCP services: review thread/session behavior before scaling.
Always verify websocket, SSE, and streaming endpoints when changing load-balancing behavior.