DuckDB
sql
tagged template literal. Use DuckDBClient
or DuckDB-Wasm directly, as shown here, if you need greater control.DuckDB is “an in-process SQL OLAP Database Management System. DuckDB-Wasm brings DuckDB to every browser thanks to WebAssembly.” DuckDB-Wasm is available by default as duckdb
in Markdown, but you can explicitly import it as:
import * as duckdb from "npm:@duckdb/duckdb-wasm";
For convenience, we provide a DatabaseClient
implementation on top of DuckDB-Wasm, DuckDBClient
. This is also available by default in Markdown, but you can explicitly import it like so:
import {DuckDBClient} from "npm:@observablehq/duckdb";
To get a DuckDB client, pass zero or more named tables to DuckDBClient.of
. Each table can be expressed as a FileAttachment
, Arquero table, Arrow table, an array of objects, or a promise to the same. For file attachments, the following formats are supported: CSV, TSV, JSON, Apache Arrow, and Apache Parquet. For example, below we load a sample of 250,000 stars from the Gaia Star Catalog as a Parquet file:
const db = DuckDBClient.of({gaia: FileAttachment("gaia-sample.parquet")});
Now we can run a query using db.sql
to bin the stars by right ascension (ra
) and declination (dec
):
const bins = db.sql`SELECT
floor(ra / 2) * 2 + 1 AS ra,
floor(dec / 2) * 2 + 1 AS dec,
count() AS count
FROM
gaia
GROUP BY
1,
2`
These bins can quickly be turned into a heatmap with Plot’s raster mark, showing the milky way.
Plot.plot({
aspectRatio: 1,
x: {domain: [0, 360]},
y: {domain: [-90, 90]},
marks: [
Plot.frame({fill: 0}),
Plot.raster(bins, {
x: "ra",
y: "dec",
fill: "count",
width: 360 / 2,
height: 180 / 2,
imageRendering: "pixelated"
})
]
})
You can also attach a complete database saved as DuckDB file, typically using the .db
file extension (or .ddb
or .duckdb
). In this case, the associated name (below base
) is a schema name rather than a table name.
const db2 = await DuckDBClient.of({base: FileAttachment("quakes.db")});
db2.queryRow(`SELECT COUNT() FROM base.events`)
For externally-hosted data, you can create an empty DuckDBClient
and load a table from a SQL query, say using read_parquet
or read_csv
. DuckDB offers many affordances to make this easier (in many cases it detects the file format and uses the correct loader automatically).
const db = await DuckDBClient.of();
await db.sql`CREATE TABLE addresses
AS SELECT *
FROM read_parquet('https://static.data.gouv.fr/resources/bureaux-de-vote-et-adresses-de-leurs-electeurs/20230626-135723/table-adresses-reu.parquet')
LIMIT 100`;
As an alternative to db.sql
, there’s also db.query
:
db.query("SELECT * FROM gaia LIMIT 10")
db.sql
and db.query
methods return a promise to an Arrow table. This columnar representation is much more efficient than an array-of-objects. You can inspect the contents of an Arrow table using Inputs.table
and pass the data to Plot.And db.queryRow
:
db.queryRow("SELECT count() AS count FROM gaia")
See the DatabaseClient Specification for more details on these methods.
Finally, the DuckDBClient.sql
method takes the same arguments as DuckDBClient.of
and returns the corresponding db.sql
tagged template literal. The returned function can be used to redefine the built-in sql
tagged template literal and thereby change the database used by SQL code blocks, allowing you to query dynamically-registered tables (unlike the sql front matter option).
const sql = DuckDBClient.sql({quakes: `https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/${feed}_day.csv`});
SELECT * FROM quakes ORDER BY updated DESC;