Multi-mode authentication
API keys for services, JWT standalone for homelab users, OIDC federation for Keycloak/Okta/Entra ID — or combine them with "either" mode.
Open source project
Butler is a reverse proxy that sits between your clients and the Ollama API. It adds multi-user authentication, per-user model authorization, rate limiting, input filtering, and observability — things Ollama intentionally does not provide.
Quick start
Build from source, create a config, and start the proxy. Clients connect to Butler instead of Ollama directly.
$ make build
$ cp butler.example.yaml butler.yaml
$ ./butler -config butler.yaml
Or use Docker Compose to run Butler alongside Ollama as a turnkey stack:
docker compose up -d
Features
API keys for services, JWT standalone for homelab users, OIDC federation for Keycloak/Okta/Entra ID — or combine them with "either" mode.
Model allowlists/denylists, rate limits, context length caps, and token prediction caps — independently configurable per service key and per authenticated user.
Validate tokens from external identity providers via JWKS auto-discovery. Map IdP roles to proxy policy without a database.
Regex-based prompt rejection, per-key and global rate limits, request size limits. Block requests before they reach the model.
Prometheus /metrics endpoint, /healthz health check with upstream verification, structured JSON logging with user identity. No external dependencies.
One Go binary, one YAML config, one external dependency (yaml.v3). The Docker image is built FROM scratch.
Architecture
Butler intercepts every request, authenticates the caller (API key, JWT, or OIDC token), extracts the model name from the request body, enforces policy, and forwards to Ollama. Auth headers are stripped so Ollama never sees them.
Client A ──┐
Client B ──┼──▶ [Butler :8080] ──▶ [Ollama :11434]
Client C ──┘
auth / ACL / filter / rate-limit / log
Request flow:
1. Authenticate caller ──▶ API key, JWT, or OIDC token
2. Parse request body ──▶ extract model name
3. Enforce policy ──▶ model ACL, rate limit, input filter
4. Forward to Ollama ──▶ strip auth header
5. Stream response back ──▶ log status + duration + user identity Configuration
Policy is a YAML file you can version-control alongside your infrastructure. Secrets
are referenced as ${ENV_VAR} and expanded at load time.
listen: "127.0.0.1:8080"
upstream: "http://127.0.0.1:11434"
global_rate_limit: "600/min"
auth:
mode: either # accept API keys, JWTs, and OIDC tokens
jwt_secret: "${JWT_SECRET}"
clients:
- name: my-app
key: "${MY_APP_KEY}"
allow_models: ["llama3.2", "mistral"]
rate_limit: "60/min"
- name: admin-tool
key: "${ADMIN_KEY}"
allow_models: ["*"]
users:
- name: alice
password_hash: "$2b$12$..."
allow_models: ["*"]
- name: kid1
password_hash: "$2b$12$..."
allow_models: ["llama3.2"]
rate_limit: "20/hour"
max_ctx: 2048 License
Butler is licensed under the Apache License 2.0. Free to use, modify, and distribute for any purpose.
Deployment
make build produces a single static binarydocker compose up -d runs Butler +
Ollama as a turnkey stack
FROM scratch with no
runtime dependencies
If you share an Ollama server across services, a homelab, or a team, Butler gives you authentication, authorization, rate limiting, and observability that Ollama does not ship with. One binary, one config file, zero changes to Ollama or your clients.