# LLM proxy lifecycle

End-to-end workflow for putting an LLM provider behind AIronClaw.

## Concept

An LLM proxy is a gateway-routed endpoint that:

1. Authenticates inbound clients (AIronClaw API key or JWT)
2. Enforces firewall rules (prompt guards, prompt redaction, model routing, rate limits, lambdas, static cache)
3. Forwards the request to the upstream provider using a server-stored provider key (the client never sees it)
4. Counts tokens, computes cost, enforces a windowed USD budget
5. Optionally logs the conversation (encrypted at rest, 7-day TTL)

Supported providers (the value of `provider`):

| Provider    | `upstreamUrl` (auto)                      |
|------------|-------------------------------------------|
| `openai`    | `https://api.openai.com`                  |
| `anthropic` | `https://api.anthropic.com`               |
| `google`    | `https://generativelanguage.googleapis.com` |
| `mistral`   | `https://api.mistral.ai`                  |

`upstreamUrl` is **derived**, not user-settable. Switching `provider` on PATCH swaps it automatically.

## 1. Create the proxy

```bash
curl -fsS -X POST \
  -H "Authorization: Bearer ${AIRONCLAW_TOKEN}" \
  -H "Content-Type: application/json" \
  "${AIRONCLAW_BASE_URL}/api/llm" \
  -d '{
    "name": "openai-prod",
    "provider": "openai",
    "allowedModels": ["gpt-4o-mini", "gpt-4o"],
    "defaultModel": "gpt-4o-mini",
    "providerKey": "sk-proj-...",
    "logConversations": false,
    "budget": { "period": "monthly", "capUsd": 500, "hardBlock": true }
  }' | jq
```

Field notes:

- **`allowedModels`** empty array = all provider models allowed. Otherwise an inbound request with a model not in the list is rejected with 403.
- **`defaultModel`** is used only by the dashboard / clients that don't specify one. The plugin does not auto-rewrite requests.
- **`providerKey`** is encrypted server-side. Once stored, it is never returned. To rotate, PATCH with the new key.
- **`logConversations: true`** opts in to encrypted prompt+completion logging with a 7-day TTL. Off by default — privacy-preserving.
- **`budget.capUsd: 0`** disables the cap (proxy still tracks usage but never blocks). `hardBlock: true` returns HTTP 402 once the cap is exceeded; `false` only sets a header + emits an event.
- **`budget.period`** values: `daily` (UTC midnight), `weekly` (UTC Monday), `monthly` (UTC 1st), `fixed` (manual reset only via `/budget/reset`).

## 2. Inbound auth (same shape as MCP)

The `auth` field on an LLM proxy is identical to MCP's. Default mode is `aifw_api_key` (clients send a key minted via `POST /api/keys` carrying `llm:<id>:model:*`). Alt mode is `jwt`. See [mcp-lifecycle.md § 3b](mcp-lifecycle.md#3b-jwt-verified-against-a-jwks).

## 3. Mint a scoped client key

```bash
curl -fsS -X POST \
  -H "Authorization: Bearer ${AIRONCLAW_TOKEN}" \
  -H "Content-Type: application/json" \
  "${AIRONCLAW_BASE_URL}/api/keys" \
  -d "{
    \"name\": \"openai-prod-app1\",
    \"llmPermissions\": [
      { \"id\": \"${LLM_ID}\", \"models\": [\"gpt-4o-mini\"] }
    ]
  }" | jq -r '.key.key'
```

`models: ["*"]` = unrestricted within the proxy's `allowedModels`.

The client uses the key against `https://${LLM_PROXY_HOST}/v1/chat/completions` (path is provider-specific — same as the upstream provider's API).

## 4. Apply firewall rules

LLM proxies accept a subset of rule types — see [rules-and-dlp.md § LLM-only rules](rules-and-dlp.md#llm-only-rules). The high-value ones:

- **`prompt_guard`** (regex or judge mode) — block / rewrite / alert on prompt-injection patterns
- **`prompt_replace`** — DLP-style redaction of the prompt before it reaches the provider
- **`model_route`** — rewrite the requested model when the prompt matches a regex (e.g. route reasoning prompts to a stronger model)
- **`rate_limit`** with `match_key=tokens_per_minute` — token-based rate limit (TPM); pre-reserves on access, reconciles on response
- **`static_cache`** — cache idempotent completions (use cautiously with non-deterministic models)

`response_replace` is **rejected** on LLM proxies — LLM responses are not mutated by AIronClaw (only redaction of the prompt is supported).

## 5. Budget management

### Read current spend

```bash
curl -fsS -H "Authorization: Bearer ${AIRONCLAW_TOKEN}" \
  "${AIRONCLAW_BASE_URL}/api/llm/${LLM_ID}/usage/daily?days=14" | jq '.window'
```

`window` = `{ tag: "monthly:202604", spentCents: 12345, rollsOverAt: <unix-ts> }`.

### Update the cap

```bash
curl -fsS -X PATCH \
  -H "Authorization: Bearer ${AIRONCLAW_TOKEN}" \
  -H "Content-Type: application/json" \
  "${AIRONCLAW_BASE_URL}/api/llm/${LLM_ID}" \
  -d '{"budget": { "period": "monthly", "capUsd": 1000, "hardBlock": true }}' | jq
```

### Manually reset the spend window

```bash
curl -fsS -X POST -H "Authorization: Bearer ${AIRONCLAW_TOKEN}" \
  "${AIRONCLAW_BASE_URL}/api/llm/${LLM_ID}/budget/reset"
```

### Per-key budgets

A single client key can have its own cap on top of the proxy cap (the stricter wins):

```bash
curl -fsS -X PUT \
  -H "Authorization: Bearer ${AIRONCLAW_TOKEN}" \
  -H "Content-Type: application/json" \
  "${AIRONCLAW_BASE_URL}/api/llm/${LLM_ID}/keys/${CRED_ID}/budget" \
  -d '{ "period": "monthly", "capUsd": 50, "hardBlock": true }'
```

Reset: `POST .../keys/${CRED_ID}/budget/reset`. Remove: `DELETE .../keys/${CRED_ID}/budget`.

## 6. Conversation logs

Only populated if the proxy was created with `logConversations: true`.

```bash
# List recent
curl -fsS -H "Authorization: Bearer ${AIRONCLAW_TOKEN}" \
  "${AIRONCLAW_BASE_URL}/api/llm/${LLM_ID}/logs?limit=50" | jq

# Read full transcript
curl -fsS -H "Authorization: Bearer ${AIRONCLAW_TOKEN}" \
  "${AIRONCLAW_BASE_URL}/api/llm/${LLM_ID}/logs/${LOG_ID}" | jq

# Bulk delete older than a day
curl -fsS -X DELETE -H "Authorization: Bearer ${AIRONCLAW_TOKEN}" \
  "${AIRONCLAW_BASE_URL}/api/llm/${LLM_ID}/logs?before=$(($(date +%s) - 86400))"
```

## 7. Day-2 operations

- **Rotate the upstream provider key**: `PATCH /api/llm/:id` with `{"providerKey": "<new>"}`. The old key is overwritten in encrypted form; the proxy keeps serving without restart.
- **Switch provider**: `PATCH /api/llm/:id` with `{"provider": "anthropic"}`. `upstreamUrl` is updated automatically. You probably also need to update `allowedModels` and rules.
- **Force DNS re-resolve**: `POST /api/llm/:id/re-resolve`.
- **Delete**: `DELETE /api/llm/:id` — same teardown as MCP.

## Pricing & cost computation

Token counts come from the provider response; cost is computed against the pinned pricing table version (`pricingVersion` field on usage rows). When provider pricing changes, AIronClaw versions the table — older log rows keep their original cost computation, new requests use the new table. Don't try to recompute costs client-side from usage rows; trust `costCents`.
