Skip to content

Data Sources & Snapshots

A data source is a named dataset attached to a project. Data can come from:

  • CSV/Excel upload — via the upload_data tool (max 100 MB)
  • Google Sheets — via the connect_google_sheet tool (OAuth2 linked)

All uploaded data is converted to Parquet format and stored in S3. Each upload creates an immutable snapshot.

A snapshot is a point-in-time copy of your data. When a model runs, it mounts the most recent snapshot for its data sources.

  • Reproducibility — re-running a model against the same snapshot produces the same results
  • Audit trail — you can see exactly which data a model run used
  • Safety — uploading new data doesn’t retroactively change past runs
ToolDescription
list_snapshotsList snapshots with dates and sizes
delete_snapshotDelete an unreferenced snapshot (cannot delete if used by a model run)

Before writing a query_data SQL statement, use list_data_sources to see all data sources attached to a project, their DuckDB table names, and column schemas. This avoids guessing table or column names:

list_data_sources → inspect tables/columns → query_data

list_data_sources returns one entry per source with tables (exact DuckDB table names to use in SQL) and schema (column names and types per table). For Google Sheet sources, snapshot_taken_at shows when the data was last synced — call connect_google_sheet to refresh stale snapshots.

The query_data tool lets you run read-only SQL (SELECT/WITH) against your data sources via DuckDB. It supports both uploaded Parquet files and Google Sheet snapshots:

  • Uploaded files (upload_data) — table name is the source_name given at upload time.
  • Google Sheet snapshots (connect_google_sheet) — each tab becomes a table named {source_name}_{tab_name} (non-alphanumeric characters in the tab name are replaced with underscores).
-- Query an uploaded Parquet file (source_name = "revenue_data"):
SELECT product_line, SUM(revenue) AS total
FROM revenue_data
GROUP BY product_line
ORDER BY total DESC
LIMIT 10
-- Query a linked Google Sheet (source = "budget", tab = "Q1 Revenue"):
SELECT * FROM budget_Q1_Revenue LIMIT 10

Results are capped at 10,000 rows. The sources field in the response lists each loaded source, its DuckDB table names, and snapshot_taken_at (for Google Sheet sources) so you can verify freshness.

The connect_google_sheet tool links a Google Sheet to a project:

  1. Bridge Town reads the specified tabs (or all tabs)
  2. Converts each tab to CSV, uploads to S3 as Parquet
  3. Stores an encrypted OAuth refresh token for future refreshes

After running a model, write outputs back to the sheet using write_gsheet:

  • spec.mode="replace" — writes to a targeted A1 notation range (e.g. "Forecast!A1" or "Sheet1!B2:F20"). Only cells within the range are overwritten; content outside is preserved.
  • spec.mode="append" — adds rows after existing content in the named tab without overwriting.