Observable Framework 1.6.0 GitHub️ 1.8k

DuckDB

The most convenient way to use DuckDB in Observable is the built-in SQL code blocks and sql tagged template literal. Use DuckDBClient or DuckDB-Wasm directly, as shown here, if you need greater control.

DuckDB is “an in-process SQL OLAP Database Management System. DuckDB-Wasm brings DuckDB to every browser thanks to WebAssembly.” DuckDB-Wasm is available by default as duckdb in Markdown, but you can explicitly import it as:

import * as duckdb from "npm:@duckdb/duckdb-wasm";

For convenience, we provide a DatabaseClient implementation on top of DuckDB-Wasm, DuckDBClient. This is also available by default in Markdown, but you can explicitly import it like so:

import {DuckDBClient} from "npm:@observablehq/duckdb";

To get a DuckDB client, pass zero or more named tables to DuckDBClient.of. Each table can be expressed as a FileAttachment, Arquero table, Arrow table, an array of objects, or a promise to the same. For file attachments, the following formats are supported: CSV, TSV, JSON, Apache Arrow, and Apache Parquet. For example, below we load a sample of 250,000 stars from the Gaia Star Catalog as a Parquet file:

const db = DuckDBClient.of({gaia: FileAttachment("gaia-sample.parquet")});

Now we can run a query using db.sql to bin the stars by right ascension (ra) and declination (dec):

const bins = db.sql`SELECT
  floor(ra / 2) * 2 + 1 AS ra,
  floor(dec / 2) * 2 + 1 AS dec,
  count() AS count
FROM
  gaia
GROUP BY
  1,
  2`

These bins can quickly be turned into a heatmap with Plot’s raster mark, showing the milky way.

Plot.plot({
  aspectRatio: 1,
  x: {domain: [0, 360]},
  y: {domain: [-90, 90]},
  marks: [
    Plot.frame({fill: 0}),
    Plot.raster(bins, {
      x: "ra",
      y: "dec",
      fill: "count",
      width: 360 / 2,
      height: 180 / 2,
      imageRendering: "pixelated"
    })
  ]
})

You can also attach a complete database saved as DuckDB file, typically using the .db file extension (or .ddb or .duckdb). In this case, the associated name (below base) is a schema name rather than a table name.

const db2 = await DuckDBClient.of({base: FileAttachment("quakes.db")});
db2.queryRow(`SELECT COUNT() FROM base.events`)

For externally-hosted data, you can create an empty DuckDBClient and load a table from a SQL query, say using read_parquet or read_csv. DuckDB offers many affordances to make this easier (in many cases it detects the file format and uses the correct loader automatically).

const db = await DuckDBClient.of();

await db.sql`CREATE TABLE addresses
  AS SELECT *
  FROM read_parquet('https://static.data.gouv.fr/resources/bureaux-de-vote-et-adresses-de-leurs-electeurs/20230626-135723/table-adresses-reu.parquet')
  LIMIT 100`;

As an alternative to db.sql, there’s also db.query:

db.query("SELECT * FROM gaia LIMIT 10")
The db.sql and db.query methods return a promise to an Arrow table. This columnar representation is much more efficient than an array-of-objects. You can inspect the contents of an Arrow table using Inputs.table and pass the data to Plot.

And db.queryRow:

db.queryRow("SELECT count() AS count FROM gaia")

See the DatabaseClient Specification for more details on these methods.

Finally, the DuckDBClient.sql method takes the same arguments as DuckDBClient.of and returns the corresponding db.sql tagged template literal. The returned function can be used to redefine the built-in sql tagged template literal and thereby change the database used by SQL code blocks, allowing you to query dynamically-registered tables (unlike the sql front matter option).

const sql = DuckDBClient.sql({quakes: `https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/${feed}_day.csv`});
SELECT * FROM quakes ORDER BY updated DESC;