Worker source delivery
How a run’s source tree reaches an untrusted worker without exposing it on
the wire or to the object store. Companion threat model in
../run-trust.md.
Code: app/crypto/bundles.py (encrypt, pond side), app/bundle_store.py
(object store), app/routers/orch.py (/orch/bundle/{token,redeem}),
swarm/src/security.py::materialize_bundle (decrypt, worker side).
Why a bundle
The worker is untrusted (BYO hosts, Tier B) and may sit anywhere. We can’t hand it the backend’s checkout directory or a plaintext archive in object storage. So pond ships an envelope-encrypted bundle: ciphertext in the object store, the data key sealed per-job to the claiming worker’s public key, and integrity pinned by a content hash. The object store sees only ciphertext; the wire carries only a one-time token; only the claiming worker can decrypt.
Flow
- Package (pond, after fetch): the runner tars the checkout, encrypts it
with a fresh data key (streaming AEAD), and uploads the ciphertext. It records
runs.source_bundle = {object_key, key_ct (Fernet), nonce_b64, sha256, size}— the data key is itself wrapped at rest with the pond source key (Fernet). - Claim: the claim response carries
source_bundle = {token, sha256, size}— a one-time redeem token plus the expected hash + size cap. No key, no URL. - Redeem (
/orch/bundle/redeem): the worker presents the token + its enrollment-pinned pubkey. Pond unwraps the data key and seals it to that worker’s X25519 pubkey (see model-credentials), returning the sealed key + a short-lived presigned GET URL. Fail-closed: no pinned pubkey → no key (never cleartext). - Materialize (worker):
materialize_bundlestreams the ciphertext to a temp file, verifying sha256 as it goes against a worker-side byte cap (defense-in-depth even though the GET is presigned), unseals the data key with its private key, decrypts the stream (decrypt_stream), and safe-extracts the tar (safe_extract— rejects absolute paths,..traversal, and symlinks that escape the destination: the zip-slip defense).
The bundle object is best-effort deleted when the run reaches a terminal state
(runner._delete_source_bundle); a bucket TTL is the backstop for crashes.
Wire format (MUST match byte-for-byte)
app/crypto/bundles.py (encrypt) and swarm/src/security.py (decrypt) implement
the same streaming AEAD format — chunked ChaCha20-Poly1305 with a per-chunk
nonce derived from (base nonce, chunk index, final flag). The two are separate
codebases (pond vs the worker tool); the format is the contract. Any change to
framing, chunk size, nonce derivation, or the final-chunk marker must land in
both, or decryption silently fails. The shared crypto lib is intentionally
config-free (keys passed in as bytes).
Invariants
- The object store only ever holds ciphertext; the data key is never stored or transmitted in clear.
- The data key is sealed to the specific claiming worker — a redeemed token is useless to anyone else.
- Integrity (
sha256) and a size cap are enforced during download, before decrypt/extract. - Extraction can never write outside the destination directory.