Configuration
Pond reads settings from the environment (.env in dev). .env.example is the
authoritative list; this is the annotated version. Unset optional values fall
back to safe defaults.
Core
| Variable | Purpose |
|---|---|
POND_DATABASE_URL | Postgres DSN for pond’s engine database. |
POND_SOURCE_KEY | Fernet key that wraps source-bundle data keys at rest. Generate with Fernet.generate_key(). Required to fetch/seal sources. |
POND_STATE_DIR | Working dir for run checkouts, outputs, and artifacts (default .pond-state). |
POND_PUBLIC_URL | Pond’s externally reachable base URL (used where pond advertises itself). |
POND_ENV_LABEL | Human label (e.g. staging) surfaced on /health. |
Authentication
| Variable | Purpose |
|---|---|
POND_SERVICE_TOKEN | Trusted first-party bearer for /v1 — acts across projects (supply projectId per request). Unset = first-party access disabled. |
AUTH0_DOMAIN, AUTH0_AUDIENCE | Tenant + audience for the Auth0-gated admin routes (/v1/keys). Set to placeholders locally if you only use the service token. |
POND_ADMIN_AUTH0_SUBS | Comma-separated Auth0 subjects allowed to call admin routes. Empty = unrestricted (dev only). |
Per-project /v1 keys (pond_pk_…) are minted via POST /v1/keys, not env.
Source delivery / object store
The source bundle (ciphertext) lives in any S3-compatible store.
| Variable | Purpose |
|---|---|
POND_BUNDLE_BUCKET | Bucket for encrypted source bundles. Unset = bundle delivery disabled (pooled runs that need sources will fail closed). |
POND_BUNDLE_ENDPOINT_URL | S3-compatible endpoint (AWS, R2, MinIO, …). |
POND_BUNDLE_REGION, POND_BUNDLE_ACCESS_KEY_ID, POND_BUNDLE_SECRET_ACCESS_KEY | Object-store credentials. |
POND_BUNDLE_PATH_STYLE | Use path-style addressing (MinIO etc.). |
POND_BUNDLE_URL_TTL_SEC | Lifetime of the presigned GET handed to a worker. |
POND_BUNDLE_MAX_BYTES | Cap on a downloaded bundle (worker-side defense). |
POND_TARBALL_MAX_BYTES, POND_S3_MAX_BYTES | Caps on tarball / S3-prefix source fetches. |
POND_LOCAL_SOURCE_ROOT | Allowed root for folder sources — gates pointing a run at arbitrary host paths. |
Run resource governance
Backstops that auto-cancel a runaway run mid-flight. 0 = disabled. A run may
tighten (never loosen) these per-run via definition.budget.
| Variable | Purpose |
|---|---|
POND_RUN_WALLCLOCK_BUDGET_SEC | Max wall-clock per run before auto-cancel. |
POND_RUN_MAX_COST_CENTS | Cost ceiling — cancels pre-emptively when live spend crosses it. |
POND_RUN_WATCHDOG_INTERVAL_SEC | How often the watchdog re-checks (default 10s). |
POND_DISPATCH_NO_WORKER_GRACE_SEC | How long a job may sit queued with no worker able to serve it before the stage fails fast with an actionable error (default 90s; tolerates a worker attaching shortly after submit). |
POND_RUN_LEASE_TTL_SEC | How long a run’s ownership lease is valid between heartbeats (default 60s). A replica owns the runs it executes and heartbeats the lease; the reaper reclaims a run only once its lease expires (owner dead/hung), never a live one. Keep comfortably larger than the watchdog + reaper intervals. |
POND_RUN_MAX_ATTEMPTS | Max times a run is started (default 2). A reclaimed run (dead owner) is re-queued for another replica until it hits this cap, then fails — so a replica death is survivable, but a genuinely-broken run can’t loop forever. 1 = no retry. |
POND_BUILD | Build identity (git sha/tag), baked at image build (--build-arg POND_BUILD=…). Reported by /health and /v1/status so you can confirm which code is running — no stale-image guessing. |
POND_REQUIRE_SANDBOX | When true, the unconfined none sandbox profile is refused (preflight 422 + dispatch fail-closed). Turn it on for any deployment that runs untrusted code, so a run can’t land on unconfined by accident. Default false (trusted-code / dev). Either way, a none stage always produces an unconfined_sandbox preflight warning — it’s never silent. |
POND_REQUIRED_RUN_LABELS | Comma-separated cost-attribution label keys every run must carry (e.g. costCenter,team). A submit/preflight missing any is rejected with a labels_required 422 — so there’s no untagged spend to break chargeback. Empty (default) = no requirement. See cost-attribution. |
Operator doctor
GET /v1/status (operator-gated) reports build id, migration code-head vs
DB-head (stale-image / pending-migration check), control-plane capabilities
(is git present for cloning sources?), pool readiness, and config counts — one
call to answer “is this deployment actually healthy and up to date?”.
| POND_ADMIN_AUTH0_SUBS | Comma-separated Auth0 subjects allowed to use the operator API (/v1/admin, /v1/keys). Fail-closed: empty grants operator rights to nobody via Auth0 — use the service token, or set this. |
Observability (OpenTelemetry)
Off until an endpoint is set. See Observability.
| Variable | Purpose |
|---|---|
OTEL_EXPORTER_OTLP_ENDPOINT | OTLP collector URL (e.g. http://collector:4318). Setting it enables traces + metrics export. |
OTEL_SERVICE_NAME | Service name in telemetry (default pond). |
POND_OTEL_CONSOLE | 1 → also print spans + metrics to stderr (local debugging, no collector needed). |
OTEL_* | All standard OpenTelemetry env (sampler, resource attributes, headers, …) is honored. |
Worker-side (the swarm tool)
Set on worker/orchestrator hosts, not on pond:
| Variable | Purpose |
|---|---|
SWARM_GVISOR_RUNTIME | OCI runtime name for the gVisor backend (default runsc). |
SWARM_KATA_RUNTIME | OCI runtime name for the Kata/Firecracker backend (auto-detected if unset). |
SWARM_BACKEND_URL, SWARM_BACKEND_TOKEN | Pond coordinates for swarm serve (or read from the paired backend-handle.json). |
See also: Quickstart · Worker pool.