Worker source delivery

How a run’s source tree reaches an untrusted worker without exposing it on the wire or to the object store. Companion threat model in ../run-trust.md.

Code: app/crypto/bundles.py (encrypt, pond side), app/bundle_store.py (object store), app/routers/orch.py (/orch/bundle/{token,redeem}), swarm/src/security.py::materialize_bundle (decrypt, worker side).

Why a bundle

The worker is untrusted (BYO hosts, Tier B) and may sit anywhere. We can’t hand it the backend’s checkout directory or a plaintext archive in object storage. So pond ships an envelope-encrypted bundle: ciphertext in the object store, the data key sealed per-job to the claiming worker’s public key, and integrity pinned by a content hash. The object store sees only ciphertext; the wire carries only a one-time token; only the claiming worker can decrypt.

Flow

  1. Package (pond, after fetch): the runner tars the checkout, encrypts it with a fresh data key (streaming AEAD), and uploads the ciphertext. It records runs.source_bundle = {object_key, key_ct (Fernet), nonce_b64, sha256, size} — the data key is itself wrapped at rest with the pond source key (Fernet).
  2. Claim: the claim response carries source_bundle = {token, sha256, size} — a one-time redeem token plus the expected hash + size cap. No key, no URL.
  3. Redeem (/orch/bundle/redeem): the worker presents the token + its enrollment-pinned pubkey. Pond unwraps the data key and seals it to that worker’s X25519 pubkey (see model-credentials), returning the sealed key + a short-lived presigned GET URL. Fail-closed: no pinned pubkey no key (never cleartext).
  4. Materialize (worker): materialize_bundle streams the ciphertext to a temp file, verifying sha256 as it goes against a worker-side byte cap (defense-in-depth even though the GET is presigned), unseals the data key with its private key, decrypts the stream (decrypt_stream), and safe-extracts the tar (safe_extract — rejects absolute paths, .. traversal, and symlinks that escape the destination: the zip-slip defense).

The bundle object is best-effort deleted when the run reaches a terminal state (runner._delete_source_bundle); a bucket TTL is the backstop for crashes.

Wire format (MUST match byte-for-byte)

app/crypto/bundles.py (encrypt) and swarm/src/security.py (decrypt) implement the same streaming AEAD format — chunked ChaCha20-Poly1305 with a per-chunk nonce derived from (base nonce, chunk index, final flag). The two are separate codebases (pond vs the worker tool); the format is the contract. Any change to framing, chunk size, nonce derivation, or the final-chunk marker must land in both, or decryption silently fails. The shared crypto lib is intentionally config-free (keys passed in as bytes).

Invariants

  • The object store only ever holds ciphertext; the data key is never stored or transmitted in clear.
  • The data key is sealed to the specific claiming worker — a redeemed token is useless to anyone else.
  • Integrity (sha256) and a size cap are enforced during download, before decrypt/extract.
  • Extraction can never write outside the destination directory.