Skip to content

Data Sources & Snapshots

A data source is a named dataset attached to a project. Data can come from:

  • CSV/Excel upload — via ingest_data_source with kind: "file_upload" (max 100 MB)
  • Google Sheets — via ingest_data_source with kind: "google_sheet_snapshot" (Picker-connected sheet)

All uploaded data is converted to Parquet format and stored in S3. Each upload creates an immutable snapshot.

A snapshot is a point-in-time copy of your data. When a model runs, it mounts the most recent snapshot for its data sources.

  • Reproducibility — re-running a model against the same snapshot produces the same results
  • Audit trail — you can see exactly which data a model run used
  • Safety — uploading new data doesn’t retroactively change past runs
ToolDescription
list_snapshotsList snapshots with dates and sizes
delete_snapshotDelete an unreferenced snapshot (cannot delete if used by a model run)

Before writing a query_data SQL statement, use list_data_sources to see all data sources attached to a project, their DuckDB table names, and column schemas. This avoids guessing table or column names:

list_data_sources → inspect tables/columns → query_data

list_data_sources returns one entry per source with tables (exact DuckDB table names to use in SQL) and schema (column names and types per table). For Google Sheet sources, snapshot_taken_at shows when the data was last synced — call ingest_data_source (kind: google_sheet_snapshot) to refresh stale snapshots.

The query_data tool lets you run read-only SQL (SELECT/WITH) against your data sources via DuckDB. It supports both uploaded Parquet files and Google Sheet snapshots:

  • Uploaded files (ingest_data_source kind: file_upload) — table name is the source_name given at upload time.
  • Google Sheet snapshots (ingest_data_source kind: google_sheet_snapshot) — each tab becomes a table named {source_name}_{tab_name} (non-alphanumeric characters in the tab name are replaced with underscores).
-- Query an uploaded Parquet file (source_name = "revenue_data"):
SELECT product_line, SUM(revenue) AS total
FROM revenue_data
GROUP BY product_line
ORDER BY total DESC
LIMIT 10
-- Query a linked Google Sheet (source = "budget", tab = "Q1 Revenue"):
SELECT * FROM budget_Q1_Revenue LIMIT 10

Results are capped at 10,000 rows. The sources field in the response lists each loaded source, its DuckDB table names, and snapshot_taken_at (for Google Sheet sources) so you can verify freshness.

The ingest_data_source tool (kind: google_sheet_snapshot) snapshots a Picker-connected Google Sheet into a project:

  1. Bridge Town reads the specified tabs (or all tabs)
  2. Converts each tab to CSV, uploads to S3 as Parquet
  3. Records the snapshot metadata for future model runs and freshness checks

The sheet must be connected once through the Bridge Town web app (Picker-based OAuth) before the tool can run. Bridge Town uses drive.file scope — it can only access Sheets you explicitly selected or that it created for you.

After running a model, write outputs back to the sheet using write_gsheet:

  • spec.mode="replace" — writes to a targeted A1 notation range (e.g. "Forecast!A1" or "Sheet1!B2:F20"). Only cells within the range are overwritten; content outside is preserved.
  • spec.mode="append" — adds rows after existing content in the named tab without overwriting.

You can also create a new Google Sheet as a write target using write_gsheet with target.kind="new_spreadsheet" — Bridge Town creates the Sheet in your Drive and registers it as a project data source in one call.