SQL

This page covers client-side SQL using DuckDB. To run a SQL query on a remote database such as PostgreSQL or Snowflake, use a data loader.

Framework includes built-in support for client-side SQL powered by DuckDB. You can use SQL to query data from CSV, TSV, JSON, Apache Arrow, Apache Parquet, and DuckDB database files, which can either be static or generated by data loaders.

To use SQL, first register the desired tables in the page’s front matter using the sql option. Each key is a table name, and each value is the path to the corresponding data file. For example, to register a table named gaia from a Parquet file:

---
sql:
  gaia: ./lib/gaia-sample.parquet
---

To load externally-hosted data, use a full URL:

---
sql:
  quakes: https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv
---

For performance and reliability, we recommend using local files rather than loading data from external servers at runtime. You can use a data loader to take a snapshot of a remote data during build if needed.

You can also register tables via code (say to have sources that are defined dynamically via user input) by defining the sql symbol with DuckDBClient.sql. To register DuckDB extensions, use the duckdb config option.

SQL code blocks

To run SQL queries, create a SQL fenced code block (```sql). For example, to query the first 10 rows from the gaia table:

```sql
SELECT * FROM gaia ORDER BY phot_g_mean_mag LIMIT 10
```

This produces a table:

To refer to the results of a query in JavaScript, use the id directive. For example, to refer to the results of the previous query as top10:

```sql id=top10
SELECT * FROM gaia ORDER BY phot_g_mean_mag LIMIT 10
```

This returns an array of 10 rows as an Apache Arrow table, inspected here:

top10

When a SQL code block uses the id directive, the results are not displayed by default. You can display them by adding the display directive, which produces the table shown above.

```sql id=top10 display
SELECT * FROM gaia ORDER BY phot_g_mean_mag LIMIT 10
```

The id directive is often a simple identifier such as top10 above, but it supports destructuring assignment, so you can refer to individual rows and columns using array and object patterns. For example, to pull out the top row:

```sql id=[top]
SELECT * FROM gaia ORDER BY phot_g_mean_mag LIMIT 1
```

top

Or to pull out the minimum value of the phot_g_mean_mag column:

```sql id=[{min}]
SELECT MIN(phot_g_mean_mag) AS min FROM gaia
```

min

For complex destructuring patterns, you may need to quote the id directive. For example, to pull out the column named min(phot_g_mean_mag) to the variable named min, say id="[{'min(phot_g_mean_mag)': min}]". Or to pull out the min and max columns, say id="[{min, max}]".

For dynamic or interactive queries that respond to user input, you can interpolate values into SQL queries using inline expressions ${…}. For example, to show the stars around a given brightness:

const mag = view(Inputs.range([6, 22], {label: "Magnitude"}));

SELECT * FROM gaia WHERE phot_g_mean_mag BETWEEN ${mag - 0.1} AND ${mag + 0.1};

The value of a SQL code block is an Apache Arrow table. This format is supported by Observable Plot, so you can use SQL and Plot together to visualize data. For example, below we count the number of stars in each 2°×2° bin of the sky (where ra is right ascension and dec is declination, representing a point on the celestial sphere in the equatorial coordinate system), and then visualize the resulting heatmap using a raster mark.

SELECT
  floor(ra / 2) * 2 + 1 AS ra,
  floor(dec / 2) * 2 + 1 AS dec,
  count() AS count
FROM
  gaia
GROUP BY
  1,
  2

Plot.plot({
  aspectRatio: 1,
  x: {domain: [0, 360]},
  y: {domain: [-90, 90]},
  marks: [
    Plot.frame({fill: 0}),
    Plot.raster(bins, {
      x: "ra",
      y: "dec",
      fill: "count",
      width: 360 / 2,
      height: 180 / 2,
      imageRendering: "pixelated"
    })
  ]
})

SQL literals

SQL fenced code blocks are shorthand for the sql tagged template literal. You can invoke the sql tagged template literal directly like so:

const [row] = await sql`SELECT random() AS random`;

row.random

The sql tag is useful for querying data within JavaScript, such as to query data for visualization without needing to create a separate SQL code block and giving the data a name. For example, below we use DuckDB to bin stars by brightness, and then visualize the bins as a histogram using a rect mark.

Plot.plot({
  x: {round: true, label: "phot_g_mean_mag"},
  marks: [
    Plot.axisY({tickFormat: (d) => d / 1000, label: "count (thousands)"}),
    Plot.rectY(await sql`
      SELECT FLOOR(phot_g_mean_mag / 0.2) * 0.2 AS mag1
           , mag1 + 0.2 AS mag2
           , COUNT() AS count
        FROM gaia
       WHERE phot_g_mean_mag IS NOT NULL
       GROUP BY 1
    `, {x1: "mag1", x2: "mag2", y: "count", tip: true})
  ]
})

The sql tag is available by default in Markdown. You can also import it explicitly as:

import {sql} from "npm:@observablehq/duckdb";

The sql tag is also useful for working around a current limitation of DuckDB-Wasm: prepared statements do not support array arguments. (Please upvote #447 if you run into this issue.) Instead of passing the array as a parameter, you can interpolate the array values directly into the SQL query.

const source_ids = [2028328031008716288n, 2076498116457016960n, 4315266827603868160n, 4123529214004874624n, 5312548578630777344n];

Inputs.table(await sql([`SELECT * FROM gaia WHERE source_id IN (${[source_ids]})`]))

When interpolating values into SQL queries, be careful to avoid SQL injection by properly escaping or sanitizing user input. The example above is safe only because source_ids are known to be numeric.