Notes

LiteLLM Got Backdoored. Here Is What I Built Instead.

A supply chain attack on LiteLLM showed the risk of a privileged AI proxy. Butler fills a similar role for Ollama with a smaller footprint and a static binary.

essay 5 min read

On March 24, Sonatype reported that compromised LiteLLM packages on PyPI delivered a credential stealer and malware dropper. The malicious releases were 1.82.7 and 1.82.8. Sonatype noted indications of TeamPCP involvement, but attribution was still under investigation at the time of publication.

Sonatype’s researchers explained why the target was so valuable:

LiteLLM typically sits directly between applications and multiple AI service providers, often accessing API keys and environment variables, allowing attackers to intercept sensitive secrets.

This belongs to the same broader family of supply chain attacks as SolarWinds and xz: compromise a component that occupies a privileged position in the stack, and inherit the trust and access that position grants. LiteLLM was not the only target. Reporting around the same campaign also tied it to Aqua’s Trivy security scanner and Checkmarx’s VS Code extensions across PyPI, GitHub, and NPM.

Why AI proxies are high-value targets

An AI proxy is a chokepoint by design. It sees every request, every response, and — critically — every credential used to authenticate with upstream providers. LiteLLM handles API keys for OpenAI, Anthropic, Azure, AWS Bedrock, and dozens of other services. A backdoor in LiteLLM does not just exfiltrate its own secrets. It exfiltrates every secret that flows through it.

This is the same reason network monitoring platforms like SolarWinds Orion are targeted. Privileged position in the architecture means privileged access to everything that position touches. The attacker does not need to compromise each downstream service individually. They compromise the proxy once and harvest credentials for all of them.

The attack surface compounds when you consider how LiteLLM is typically deployed: as a shared gateway that multiple services route through, in environments where API keys for cloud AI providers are the most sensitive secrets in the stack.

What I built instead

I run Ollama for local inference, and I needed the same thing LiteLLM provides — a proxy that handles auth, rate limiting, and access control between clients and the model server. So I built Butler.

Butler is an access-control reverse proxy for Ollama. It sits between clients and the Ollama API:

Client A ──┐
Client B ──┼──▶ [butler :8080] ──▶ [ollama :11434]
Client C ──┘
            auth / ACL / filter / limit / log

It handles per-client authentication (API keys, JWT, or OIDC federation), model-level allowlists and denylists, rate limiting, prompt filtering, and structured logging. The feature set covers the same governance gap that LiteLLM fills for cloud providers, but for local infrastructure.

The difference is in how it is built.

Three dependencies

Butler’s entire dependency tree:

github.com/golang-jwt/jwt/v5   # JWT signing/validation
golang.org/x/crypto             # bcrypt password hashing
gopkg.in/yaml.v3                # YAML config parsing

That is it. Three direct dependencies. The HTTP server, reverse proxy, JSON handling, logging, and metrics are all standard library. The Docker image is built FROM scratch — the container contains a single static binary and nothing else.

LiteLLM’s pyproject.toml carries dozens of dependencies, many of them optional, and each of those pulls in its own transitive tree. The compromised versions were distributed through PyPI, the same package registry that every pip install touches. Every dependency is a link in the supply chain, and every link is a potential point of compromise.

Insight

Dependency count is attack surface

The xz backdoor compromised a single library and reached sshd through a transitive dependency chain. The TeamPCP campaign hit packages across three ecosystems simultaneously. Each dependency you take on is a bet that every maintainer in that subtree, and every build system they use, will remain uncompromised for the lifetime of your deployment. Three dependencies is a different risk profile than dozens.

No cloud keys in the blast radius

LiteLLM’s compromise was devastating because it handles API keys for external providers. An attacker who controls LiteLLM can harvest OpenAI keys, Anthropic keys, Azure credentials — anything the proxy uses to authenticate upstream.

Butler proxies to a local Ollama instance. There are no cloud API keys flowing through it. The most sensitive data Butler handles is the client API keys and JWT secrets in its own config file, which are local secrets you control and can rotate without coordinating with third parties.

This is not an argument against cloud AI services. It is an argument for understanding what sits in the blast radius when a proxy is compromised. A backdoored local proxy can still leak local secrets, prompts, and responses. A backdoored cloud proxy can additionally expose the keys it uses to authenticate with external providers.

Fail closed, single binary, no state

Butler is designed around a few principles that limit what a compromise can reach:

Fail closed. If Butler cannot verify a client, the request is denied. There is no anonymous fallback, no permissive default. An attacker who disrupts Butler’s auth cannot use that disruption to bypass it.

Single static binary. No runtime dependencies, no interpreter, no package manager pulling code at startup. What you build is what you deploy. The FROM scratch Docker image means there is no shell in the container, no package manager, no tools an attacker could use for lateral movement.

No database, no Redis. Policy is a YAML file. Rate limiting uses in-memory fixed-window counters. There is no external state store to compromise, no connection string to steal, no second service to attack.

Config as code. The YAML config can be version-controlled alongside your infrastructure. Changes are auditable through git history. There is no admin UI, no API for runtime policy changes, no surface for privilege escalation through the management plane.

The broader lesson

The TeamPCP campaign is a reminder that supply chain attacks target positions, not just packages. LiteLLM was valuable because of where it sits, not because of a specific vulnerability in its code. The same logic applies to any proxy, gateway, or middleware that occupies a chokepoint in your architecture.

If you run local LLMs and need access control, authentication, and rate limiting in front of Ollama, Butler is the tool I built to solve that problem with a deliberately smaller attack surface. Three dependencies, one binary, no cloud provider keys in the request path, fail closed.

The source is on GitHub under Apache 2.0.