Configuration

Pond reads settings from the environment (.env in dev). .env.example is the authoritative list; this is the annotated version. Unset optional values fall back to safe defaults.

Core

Variable	Purpose
`POND_DATABASE_URL`	Postgres DSN for pond’s engine database.
`POND_SOURCE_KEY`	Fernet key that wraps source-bundle data keys at rest. Generate with `Fernet.generate_key()`. Required to fetch/seal sources.
`POND_STATE_DIR`	Working dir for run checkouts, outputs, and artifacts (default `.pond-state`).
`POND_PUBLIC_URL`	Pond’s externally reachable base URL (used where pond advertises itself).
`POND_ENV_LABEL`	Human label (e.g. `staging`) surfaced on `/health`.

Authentication

Variable	Purpose
`POND_SERVICE_TOKEN`	Trusted first-party bearer for `/v1` — acts across projects (supply `projectId` per request). Unset = first-party access disabled.
`AUTH0_DOMAIN`, `AUTH0_AUDIENCE`	Tenant + audience for the Auth0-gated admin routes (`/v1/keys`). Set to placeholders locally if you only use the service token.
`POND_ADMIN_AUTH0_SUBS`	Comma-separated Auth0 subjects allowed to call admin routes. Empty = unrestricted (dev only).

Per-project /v1 keys (pond_pk_…) are minted via POST /v1/keys, not env.

Source delivery / object store

The source bundle (ciphertext) lives in any S3-compatible store.

Variable	Purpose
`POND_BUNDLE_BUCKET`	Bucket for encrypted source bundles. Unset = bundle delivery disabled (pooled runs that need sources will fail closed).
`POND_BUNDLE_ENDPOINT_URL`	S3-compatible endpoint (AWS, R2, MinIO, …).
`POND_BUNDLE_REGION`, `POND_BUNDLE_ACCESS_KEY_ID`, `POND_BUNDLE_SECRET_ACCESS_KEY`	Object-store credentials.
`POND_BUNDLE_PATH_STYLE`	Use path-style addressing (MinIO etc.).
`POND_BUNDLE_URL_TTL_SEC`	Lifetime of the presigned GET handed to a worker.
`POND_BUNDLE_MAX_BYTES`	Cap on a downloaded bundle (worker-side defense).
`POND_TARBALL_MAX_BYTES`, `POND_S3_MAX_BYTES`	Caps on tarball / S3-prefix source fetches.
`POND_LOCAL_SOURCE_ROOT`	Allowed root for `folder` sources — gates pointing a run at arbitrary host paths.

Run resource governance

Backstops that auto-cancel a runaway run mid-flight. 0 = disabled. A run may tighten (never loosen) these per-run via definition.budget.

Variable	Purpose
`POND_RUN_WALLCLOCK_BUDGET_SEC`	Max wall-clock per run before auto-cancel.
`POND_RUN_MAX_COST_CENTS`	Cost ceiling — cancels pre-emptively when live spend crosses it.
`POND_RUN_WATCHDOG_INTERVAL_SEC`	How often the watchdog re-checks (default 10s).
`POND_DISPATCH_NO_WORKER_GRACE_SEC`	How long a job may sit queued with no worker able to serve it before the stage fails fast with an actionable error (default 90s; tolerates a worker attaching shortly after submit).
`POND_RUN_LEASE_TTL_SEC`	How long a run’s ownership lease is valid between heartbeats (default 60s). A replica owns the runs it executes and heartbeats the lease; the reaper reclaims a run only once its lease expires (owner dead/hung), never a live one. Keep comfortably larger than the watchdog + reaper intervals.
`POND_RUN_MAX_ATTEMPTS`	Max times a run is started (default 2). A reclaimed run (dead owner) is re-queued for another replica until it hits this cap, then fails — so a replica death is survivable, but a genuinely-broken run can’t loop forever. 1 = no retry.
`POND_BUILD`	Build identity (git sha/tag), baked at image build (`--build-arg POND_BUILD=…`). Reported by `/health` and `/v1/status` so you can confirm which code is running — no stale-image guessing.
`POND_REQUIRE_SANDBOX`	When `true`, the unconfined `none` sandbox profile is refused (preflight 422 + dispatch fail-closed). Turn it on for any deployment that runs untrusted code, so a run can’t land on unconfined by accident. Default `false` (trusted-code / dev). Either way, a `none` stage always produces an `unconfined_sandbox` preflight warning — it’s never silent.
`POND_REQUIRED_RUN_LABELS`	Comma-separated cost-attribution label keys every run must carry (e.g. `costCenter,team`). A submit/preflight missing any is rejected with a `labels_required` 422 — so there’s no untagged spend to break chargeback. Empty (default) = no requirement. See cost-attribution.

Operator doctor

GET /v1/status (operator-gated) reports build id, migration code-head vs DB-head (stale-image / pending-migration check), control-plane capabilities (is git present for cloning sources?), pool readiness, and config counts — one call to answer “is this deployment actually healthy and up to date?”. | POND_ADMIN_AUTH0_SUBS | Comma-separated Auth0 subjects allowed to use the operator API (/v1/admin, /v1/keys). Fail-closed: empty grants operator rights to nobody via Auth0 — use the service token, or set this. |

Observability (OpenTelemetry)

Off until an endpoint is set. See Observability.

Variable	Purpose
`OTEL_EXPORTER_OTLP_ENDPOINT`	OTLP collector URL (e.g. `http://collector:4318`). Setting it enables traces + metrics export.
`OTEL_SERVICE_NAME`	Service name in telemetry (default `pond`).
`POND_OTEL_CONSOLE`	`1` → also print spans + metrics to stderr (local debugging, no collector needed).
`OTEL_*`	All standard OpenTelemetry env (sampler, resource attributes, headers, …) is honored.

Worker-side (the `swarm` tool)

Set on worker/orchestrator hosts, not on pond:

Variable	Purpose
`SWARM_GVISOR_RUNTIME`	OCI runtime name for the gVisor backend (default `runsc`).
`SWARM_KATA_RUNTIME`	OCI runtime name for the Kata/Firecracker backend (auto-detected if unset).
`SWARM_BACKEND_URL`, `SWARM_BACKEND_TOKEN`	Pond coordinates for `swarm serve` (or read from the paired `backend-handle.json`).