Data Sources & Snapshots

Data sources

A data source is a named dataset attached to a project. Data can come from:

CSV/Excel upload — via the upload_data tool (max 100 MB)
Google Sheets — via the connect_google_sheet tool (OAuth2 linked)

All uploaded data is converted to Parquet format and stored in S3. Each upload creates an immutable snapshot.

Snapshots

A snapshot is a point-in-time copy of your data. When a model runs, it mounts the most recent snapshot for its data sources.

Why snapshots?

Reproducibility — re-running a model against the same snapshot produces the same results
Audit trail — you can see exactly which data a model run used
Safety — uploading new data doesn’t retroactively change past runs

Snapshot tools

Tool	Description
`list_snapshots`	List snapshots with dates and sizes
`delete_snapshot`	Delete an unreferenced snapshot (cannot delete if used by a model run)

Discovering data sources

Before writing a query_data SQL statement, use list_data_sources to see all data sources attached to a project, their DuckDB table names, and column schemas. This avoids guessing table or column names:

list_data_sources → inspect tables/columns → query_data

list_data_sources returns one entry per source with tables (exact DuckDB table names to use in SQL) and schema (column names and types per table). For Google Sheet sources, snapshot_taken_at shows when the data was last synced — call connect_google_sheet to refresh stale snapshots.

Querying data

The query_data tool lets you run read-only SQL (SELECT/WITH) against your data sources via DuckDB. It supports both uploaded Parquet files and Google Sheet snapshots:

Uploaded files (upload_data) — table name is the source_name given at upload time.
Google Sheet snapshots (connect_google_sheet) — each tab becomes a table named {source_name}_{tab_name} (non-alphanumeric characters in the tab name are replaced with underscores).

-- Query an uploaded Parquet file (source_name = "revenue_data"):
SELECT product_line, SUM(revenue) AS total
FROM revenue_data
GROUP BY product_line
ORDER BY total DESC
LIMIT 10

-- Query a linked Google Sheet (source = "budget", tab = "Q1 Revenue"):
SELECT * FROM budget_Q1_Revenue LIMIT 10

Results are capped at 10,000 rows. The sources field in the response lists each loaded source, its DuckDB table names, and snapshot_taken_at (for Google Sheet sources) so you can verify freshness.

Google Sheets integration

The connect_google_sheet tool links a Google Sheet to a project:

Bridge Town reads the specified tabs (or all tabs)
Converts each tab to CSV, uploads to S3 as Parquet
Stores an encrypted OAuth refresh token for future refreshes

After running a model, write outputs back to the sheet using write_gsheet:

spec.mode="replace" — writes to a targeted A1 notation range (e.g. "Forecast!A1" or "Sheet1!B2:F20"). Only cells within the range are overwritten; content outside is preserved.
spec.mode="append" — adds rows after existing content in the named tab without overwriting.