SQL
Framework includes built-in support for client-side SQL powered by DuckDB. You can use SQL to query data from CSV, TSV, JSON, Apache Arrow, Apache Parquet, and DuckDB database files, which can either be static or generated by data loaders.
To use SQL, first register the desired tables in the page’s front matter using the sql option. Each key is a table name, and each value is the path to the corresponding data file. For example, to register a table named gaia
from a Parquet file:
---
sql:
gaia: ./lib/gaia-sample.parquet
---
To load externally-hosted data, use a full URL:
---
sql:
quakes: https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv
---
You can also register tables via code (say to have sources that are defined dynamically via user input) by defining the sql
symbol with DuckDBClient.sql. To register DuckDB extensions, use the duckdb config option.
SQL code blocks
To run SQL queries, create a SQL fenced code block (```sql
). For example, to query the first 10 rows from the gaia
table:
```sql
SELECT * FROM gaia ORDER BY phot_g_mean_mag LIMIT 10
```
This produces a table:
To refer to the results of a query in JavaScript, use the id
directive. For example, to refer to the results of the previous query as top10
:
```sql id=top10
SELECT * FROM gaia ORDER BY phot_g_mean_mag LIMIT 10
```
This returns an array of 10 rows as an Apache Arrow table, inspected here:
top10
When a SQL code block uses the id
directive, the results are not displayed by default. You can display them by adding the display
directive, which produces the table shown above.
```sql id=top10 display
SELECT * FROM gaia ORDER BY phot_g_mean_mag LIMIT 10
```
The id
directive is often a simple identifier such as top10
above, but it supports destructuring assignment, so you can refer to individual rows and columns using array and object patterns. For example, to pull out the top row:
```sql id=[top]
SELECT * FROM gaia ORDER BY phot_g_mean_mag LIMIT 1
```
top
Or to pull out the minimum value of the phot_g_mean_mag
column:
```sql id=[{min}]
SELECT MIN(phot_g_mean_mag) AS min FROM gaia
```
min
For complex destructuring patterns, you may need to quote the id
directive. For example, to pull out the column named min(phot_g_mean_mag)
to the variable named min
, say id="[{'min(phot_g_mean_mag)': min}]"
. Or to pull out the min
and max
columns, say id="[{min, max}]"
.
For dynamic or interactive queries that respond to user input, you can interpolate values into SQL queries using inline expressions ${…}
. For example, to show the stars around a given brightness:
const mag = view(Inputs.range([6, 22], {label: "Magnitude"}));
SELECT * FROM gaia WHERE phot_g_mean_mag BETWEEN ${mag - 0.1} AND ${mag + 0.1};
The value of a SQL code block is an Apache Arrow table. This format is supported by Observable Plot, so you can use SQL and Plot together to visualize data. For example, below we count the number of stars in each 2°×2° bin of the sky (where ra
is right ascension and dec
is declination, representing a point on the celestial sphere in the equatorial coordinate system), and then visualize the resulting heatmap using a raster mark.
SELECT
floor(ra / 2) * 2 + 1 AS ra,
floor(dec / 2) * 2 + 1 AS dec,
count() AS count
FROM
gaia
GROUP BY
1,
2
Plot.plot({
aspectRatio: 1,
x: {domain: [0, 360]},
y: {domain: [-90, 90]},
marks: [
Plot.frame({fill: 0}),
Plot.raster(bins, {
x: "ra",
y: "dec",
fill: "count",
width: 360 / 2,
height: 180 / 2,
imageRendering: "pixelated"
})
]
})
SQL literals
SQL fenced code blocks are shorthand for the sql
tagged template literal. You can invoke the sql
tagged template literal directly like so:
const [row] = await sql`SELECT random() AS random`;
row.random
The sql
tag is useful for querying data within JavaScript, such as to query data for visualization without needing to create a separate SQL code block and giving the data a name. For example, below we use DuckDB to bin stars by brightness, and then visualize the bins as a histogram using a rect mark.
Plot.plot({
x: {round: true, label: "phot_g_mean_mag"},
marks: [
Plot.axisY({tickFormat: (d) => d / 1000, label: "count (thousands)"}),
Plot.rectY(await sql`
SELECT FLOOR(phot_g_mean_mag / 0.2) * 0.2 AS mag1
, mag1 + 0.2 AS mag2
, COUNT() AS count
FROM gaia
WHERE phot_g_mean_mag IS NOT NULL
GROUP BY 1
`, {x1: "mag1", x2: "mag2", y: "count", tip: true})
]
})
The sql
tag is available by default in Markdown. You can also import it explicitly as:
import {sql} from "npm:@observablehq/duckdb";
The sql
tag is also useful for working around a current limitation of DuckDB-Wasm: prepared statements do not support array arguments. (Please upvote #447 if you run into this issue.) Instead of passing the array as a parameter, you can interpolate the array values directly into the SQL query.
const source_ids = [2028328031008716288n, 2076498116457016960n, 4315266827603868160n, 4123529214004874624n, 5312548578630777344n];
Inputs.table(await sql([`SELECT * FROM gaia WHERE source_id IN (${[source_ids]})`]))
When interpolating values into SQL queries, be careful to avoid SQL injection by properly escaping or sanitizing user input. The example above is safe only because source_ids
are known to be numeric.