Observability
Pond is instrumented with OpenTelemetry — distributed traces and metrics, exported over OTLP to any collector (Grafana Tempo/Mimir, Honeycomb, Datadog, an OTel Collector, …).
It’s off until you configure an endpoint: with no exporter set, no providers are installed and every span/metric is a no-op, so dev and tests carry zero overhead.
Enable it
Set the standard OpenTelemetry endpoint and run pond:
export OTEL_EXPORTER_OTLP_ENDPOINT=http://collector:4318 # OTLP/HTTP
export OTEL_SERVICE_NAME=pond # optional (default: pond)
uv run uvicorn app.main:app --port 8001
All the standard OTEL_* env is honored (sampler, resource attributes, headers,
…). For a quick local look without a collector:
export POND_OTEL_CONSOLE=1 # print spans + metrics to stderr
deployment.environment is set from POND_ENV_LABEL when present.
What you get
Traces — auto-instrumented HTTP (FastAPI), DB (SQLAlchemy), outbound HTTP (httpx/urllib) and S3 (botocore), plus pond’s own run lifecycle:
| Span | Covers |
|---|---|
run.execute | one run, start → terminal (its own trace; a run outlives the submit request) |
run.stage | one stage’s execution, nested under run.execute |
Each carries pond.run_id, pond.project_id, pond.stage, pond.executor,
pond.status. Log records are stamped with trace/span ids so logs and traces
correlate.
Metrics:
| Metric | Type | Key attributes |
|---|---|---|
pond.runs.started | counter | trigger |
pond.runs.finished | counter | status |
pond.run.duration (s) | histogram | status |
pond.stage.duration (s) | histogram | stage, status |
pond.watchdog.cancels | counter | reason (wallclock/cost) |
pond.model.calls | counter | model |
pond.model.tokens | counter | model, direction |
pond.model.cost_cents | counter | model |
What this is not
- No
/metricsscrape endpoint. Pond pushes via OTLP; point a collector at it (the collector can re-expose Prometheus if you want pull). - Per-run logs + cost stay first-class on
/v1.GET /v1/runs/{id}/logsand…/usageremain the product-facing surface; OTel is the operational view. The two correlate via trace ids in the logs.
See also: Configuration · Concepts.