Configuration

Pond reads settings from the environment (.env in dev). .env.example is the authoritative list; this is the annotated version. Unset optional values fall back to safe defaults.

Core

VariablePurpose
POND_DATABASE_URLPostgres DSN for pond’s engine database.
POND_SOURCE_KEYFernet key that wraps source-bundle data keys at rest. Generate with Fernet.generate_key(). Required to fetch/seal sources.
POND_STATE_DIRWorking dir for run checkouts, outputs, and artifacts (default .pond-state).
POND_PUBLIC_URLPond’s externally reachable base URL (used where pond advertises itself).
POND_ENV_LABELHuman label (e.g. staging) surfaced on /health.

Authentication

VariablePurpose
POND_SERVICE_TOKENTrusted first-party bearer for /v1 — acts across projects (supply projectId per request). Unset = first-party access disabled.
AUTH0_DOMAIN, AUTH0_AUDIENCETenant + audience for the Auth0-gated admin routes (/v1/keys). Set to placeholders locally if you only use the service token.
POND_ADMIN_AUTH0_SUBSComma-separated Auth0 subjects allowed to call admin routes. Empty = unrestricted (dev only).

Per-project /v1 keys (pond_pk_…) are minted via POST /v1/keys, not env.

Source delivery / object store

The source bundle (ciphertext) lives in any S3-compatible store.

VariablePurpose
POND_BUNDLE_BUCKETBucket for encrypted source bundles. Unset = bundle delivery disabled (pooled runs that need sources will fail closed).
POND_BUNDLE_ENDPOINT_URLS3-compatible endpoint (AWS, R2, MinIO, …).
POND_BUNDLE_REGION, POND_BUNDLE_ACCESS_KEY_ID, POND_BUNDLE_SECRET_ACCESS_KEYObject-store credentials.
POND_BUNDLE_PATH_STYLEUse path-style addressing (MinIO etc.).
POND_BUNDLE_URL_TTL_SECLifetime of the presigned GET handed to a worker.
POND_BUNDLE_MAX_BYTESCap on a downloaded bundle (worker-side defense).
POND_TARBALL_MAX_BYTES, POND_S3_MAX_BYTESCaps on tarball / S3-prefix source fetches.
POND_LOCAL_SOURCE_ROOTAllowed root for folder sources — gates pointing a run at arbitrary host paths.

Run resource governance

Backstops that auto-cancel a runaway run mid-flight. 0 = disabled. A run may tighten (never loosen) these per-run via definition.budget.

VariablePurpose
POND_RUN_WALLCLOCK_BUDGET_SECMax wall-clock per run before auto-cancel.
POND_RUN_MAX_COST_CENTSCost ceiling — cancels pre-emptively when live spend crosses it.
POND_RUN_WATCHDOG_INTERVAL_SECHow often the watchdog re-checks (default 10s).
POND_DISPATCH_NO_WORKER_GRACE_SECHow long a job may sit queued with no worker able to serve it before the stage fails fast with an actionable error (default 90s; tolerates a worker attaching shortly after submit).
POND_RUN_LEASE_TTL_SECHow long a run’s ownership lease is valid between heartbeats (default 60s). A replica owns the runs it executes and heartbeats the lease; the reaper reclaims a run only once its lease expires (owner dead/hung), never a live one. Keep comfortably larger than the watchdog + reaper intervals.
POND_RUN_MAX_ATTEMPTSMax times a run is started (default 2). A reclaimed run (dead owner) is re-queued for another replica until it hits this cap, then fails — so a replica death is survivable, but a genuinely-broken run can’t loop forever. 1 = no retry.
POND_BUILDBuild identity (git sha/tag), baked at image build (--build-arg POND_BUILD=…). Reported by /health and /v1/status so you can confirm which code is running — no stale-image guessing.
POND_REQUIRE_SANDBOXWhen true, the unconfined none sandbox profile is refused (preflight 422 + dispatch fail-closed). Turn it on for any deployment that runs untrusted code, so a run can’t land on unconfined by accident. Default false (trusted-code / dev). Either way, a none stage always produces an unconfined_sandbox preflight warning — it’s never silent.
POND_REQUIRED_RUN_LABELSComma-separated cost-attribution label keys every run must carry (e.g. costCenter,team). A submit/preflight missing any is rejected with a labels_required 422 — so there’s no untagged spend to break chargeback. Empty (default) = no requirement. See cost-attribution.

Operator doctor

GET /v1/status (operator-gated) reports build id, migration code-head vs DB-head (stale-image / pending-migration check), control-plane capabilities (is git present for cloning sources?), pool readiness, and config counts — one call to answer “is this deployment actually healthy and up to date?”. | POND_ADMIN_AUTH0_SUBS | Comma-separated Auth0 subjects allowed to use the operator API (/v1/admin, /v1/keys). Fail-closed: empty grants operator rights to nobody via Auth0 — use the service token, or set this. |

Observability (OpenTelemetry)

Off until an endpoint is set. See Observability.

VariablePurpose
OTEL_EXPORTER_OTLP_ENDPOINTOTLP collector URL (e.g. http://collector:4318). Setting it enables traces + metrics export.
OTEL_SERVICE_NAMEService name in telemetry (default pond).
POND_OTEL_CONSOLE1 also print spans + metrics to stderr (local debugging, no collector needed).
OTEL_*All standard OpenTelemetry env (sampler, resource attributes, headers, …) is honored.

Worker-side (the swarm tool)

Set on worker/orchestrator hosts, not on pond:

VariablePurpose
SWARM_GVISOR_RUNTIMEOCI runtime name for the gVisor backend (default runsc).
SWARM_KATA_RUNTIMEOCI runtime name for the Kata/Firecracker backend (auto-detected if unset).
SWARM_BACKEND_URL, SWARM_BACKEND_TOKENPond coordinates for swarm serve (or read from the paired backend-handle.json).

See also: Quickstart · Worker pool.