Open source project

Access control for Ollama

Butler is a reverse proxy that sits between your clients and the Ollama API. It adds multi-user authentication, per-user model authorization, rate limiting, input filtering, and observability — things Ollama intentionally does not provide.

View on GitHub Get started

Quick start

Up and running in two minutes

Build from source, create a config, and start the proxy. Clients connect to Butler instead of Ollama directly.

$ make build
$ cp butler.example.yaml butler.yaml
$ ./butler -config butler.yaml

Or use Docker Compose to run Butler alongside Ollama as a turnkey stack: docker compose up -d

Features

What you get out of the box

Multi-mode authentication

API keys for services, JWT standalone for homelab users, OIDC federation for Keycloak/Okta/Entra ID — or combine them with "either" mode.

Per-user and per-client policy

Model allowlists/denylists, rate limits, context length caps, and token prediction caps — independently configurable per service key and per authenticated user.

OIDC federation

Validate tokens from external identity providers via JWKS auto-discovery. Map IdP roles to proxy policy without a database.

Input filtering and rate limiting

Regex-based prompt rejection, per-key and global rate limits, request size limits. Block requests before they reach the model.

Observability

Prometheus /metrics endpoint, /healthz health check with upstream verification, structured JSON logging with user identity. No external dependencies.

Single static binary

One Go binary, one YAML config, one external dependency (yaml.v3). The Docker image is built FROM scratch.

Architecture

How it works

Butler intercepts every request, authenticates the caller (API key, JWT, or OIDC token), extracts the model name from the request body, enforces policy, and forwards to Ollama. Auth headers are stripped so Ollama never sees them.

Client A ──┐
Client B ──┼──▶ [Butler :8080] ──▶ [Ollama :11434]
Client C ──┘
            auth / ACL / filter / rate-limit / log

Request flow:
  1. Authenticate caller  ──▶ API key, JWT, or OIDC token
  2. Parse request body   ──▶ extract model name
  3. Enforce policy       ──▶ model ACL, rate limit, input filter
  4. Forward to Ollama    ──▶ strip auth header
  5. Stream response back ──▶ log status + duration + user identity

Configuration

One YAML file

Policy is a YAML file you can version-control alongside your infrastructure. Secrets are referenced as ${ENV_VAR} and expanded at load time.

listen: "127.0.0.1:8080"
upstream: "http://127.0.0.1:11434"
global_rate_limit: "600/min"

auth:
  mode: either               # accept API keys, JWTs, and OIDC tokens
  jwt_secret: "${JWT_SECRET}"

clients:
  - name: my-app
    key: "${MY_APP_KEY}"
    allow_models: ["llama3.2", "mistral"]
    rate_limit: "60/min"

  - name: admin-tool
    key: "${ADMIN_KEY}"
    allow_models: ["*"]

users:
  - name: alice
    password_hash: "$2b$12$..."
    allow_models: ["*"]

  - name: kid1
    password_hash: "$2b$12$..."
    allow_models: ["llama3.2"]
    rate_limit: "20/hour"
    max_ctx: 2048

License

Open source

Butler is licensed under the Apache License 2.0. Free to use, modify, and distribute for any purpose.

Deployment

Run it your way

From source — make build produces a single static binary
Docker Compose — docker compose up -d runs Butler + Ollama as a turnkey stack
Docker — minimal image built FROM scratch with no runtime dependencies

Lock down your Ollama

If you share an Ollama server across services, a homelab, or a team, Butler gives you authentication, authorization, rate limiting, and observability that Ollama does not ship with. One binary, one config file, zero changes to Ollama or your clients.

GitHub Documentation