Framework
1.13.0 GitHub️ 2.6k

DuckDB

The most convenient way to use DuckDB in Observable is the built-in SQL code blocks and sql tagged template literal. Use DuckDBClient or DuckDB-Wasm directly, as shown here, if you need greater control.

DuckDB is “an in-process SQL OLAP Database Management System. DuckDB-Wasm brings DuckDB to every browser thanks to WebAssembly.” DuckDB-Wasm is available by default as duckdb in Markdown, but you can explicitly import it as:

import * as duckdb from "npm:@duckdb/duckdb-wasm";

For convenience, we provide a DatabaseClient implementation on top of DuckDB-Wasm, DuckDBClient. This is also available by default in Markdown, but you can explicitly import it like so:

import {DuckDBClient} from "npm:@observablehq/duckdb";

To get a DuckDB client, pass zero or more named tables to DuckDBClient.of. Each table can be expressed as a FileAttachment, Arquero table, Arrow table, an array of objects, or a promise to the same. For file attachments, the following formats are supported: CSV, TSV, JSON, Apache Arrow, and Apache Parquet. For example, below we load a sample of 250,000 stars from the Gaia Star Catalog as a Parquet file:

const db = DuckDBClient.of({gaia: FileAttachment("gaia-sample.parquet")});

Now we can run a query using db.sql to bin the stars by right ascension (ra) and declination (dec):

const bins = db.sql`SELECT
  floor(ra / 2) * 2 + 1 AS ra,
  floor(dec / 2) * 2 + 1 AS dec,
  count() AS count
FROM
  gaia
GROUP BY
  1,
  2`

These bins can quickly be turned into a heatmap with Plot’s raster mark, showing the milky way.

Plot.plot({
  aspectRatio: 1,
  x: {domain: [0, 360]},
  y: {domain: [-90, 90]},
  marks: [
    Plot.frame({fill: 0}),
    Plot.raster(bins, {
      x: "ra",
      y: "dec",
      fill: "count",
      width: 360 / 2,
      height: 180 / 2,
      imageRendering: "pixelated"
    })
  ]
})

You can also attach a complete database saved as DuckDB file, typically using the .db file extension (or .ddb or .duckdb). In this case, the associated name (below base) is a schema name rather than a table name.

const db2 = await DuckDBClient.of({base: FileAttachment("quakes.db")});
db2.queryRow(`SELECT COUNT() FROM base.events`)

For externally-hosted data, you can create an empty DuckDBClient and load a table from a SQL query, say using read_parquet or read_csv. DuckDB offers many affordances to make this easier. (In many cases it detects the file format and uses the correct loader automatically.)

const db = await DuckDBClient.of();

await db.sql`CREATE TABLE addresses
  AS SELECT *
  FROM read_parquet('https://static.data.gouv.fr/resources/bureaux-de-vote-et-adresses-de-leurs-electeurs/20230626-135723/table-adresses-reu.parquet')
  LIMIT 100`;

As an alternative to db.sql, there’s also db.query:

db.query("SELECT * FROM gaia LIMIT 10")
The db.sql and db.query methods return a promise to an Arrow table. This columnar representation is much more efficient than an array-of-objects. You can inspect the contents of an Arrow table using Inputs.table and pass the data to Plot.

And db.queryRow:

db.queryRow("SELECT count() AS count FROM gaia")

See the DatabaseClient Specification for more details on these methods.

Finally, the DuckDBClient.sql method takes the same arguments as DuckDBClient.of and returns the corresponding db.sql tagged template literal. The returned function can be used to redefine the built-in sql tagged template literal and thereby change the database used by SQL code blocks, allowing you to query dynamically-registered tables (unlike the sql front matter option).

const sql = DuckDBClient.sql({quakes: `https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/${feed}_day.csv`});
SELECT * FROM quakes ORDER BY updated DESC;

Extensions

DuckDB extensions extend DuckDB’s functionality, adding support for additional file formats, new types, and domain-specific functions. For example, the json extension provides a read_json method for reading JSON files:

SELECT bbox FROM read_json('https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson');

To read a local file (or data loader), use FileAttachment and interpolation ${…}:

SELECT bbox FROM read_json(${FileAttachment("../quakes.json").href});

For convenience, Framework configures the json and parquet extensions by default. Some other core extensions also autoload, meaning that you don’t need to explicitly enable them; however, Framework will only self-host extensions if you explicitly configure them, and therefore we recommend that you always use the duckdb config option to configure DuckDB extensions. Any configured extensions will be automatically installed and loaded, making them available in SQL code blocks as well as the sql and DuckDBClient built-ins.

For example, to configure the spatial extension:

export default {
  duckdb: {
    extensions: ["spatial"]
  }
};

You can then use the ST_Area function to compute the area of a polygon:

SELECT ST_Area('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::GEOMETRY) as area;

To tell which extensions have been loaded, you can run the following query:

FROM duckdb_extensions() WHERE loaded;

If the duckdb_extensions() function runs before DuckDB autoloads a core extension (such as json), it might not be included in the returned set.

Self-hosting of extensions

As with npm imports, configured DuckDB extensions are self-hosted, improving performance, stability, & security, and allowing you to develop offline. Extensions are downloaded to the DuckDB cache folder, which lives in .observablehq/cache/_duckdb within the source root (typically src). You can clear the cache and restart the preview server to re-fetch the latest versions of any DuckDB extensions. If you use an autoloading core extension that is not configured, DuckDB-Wasm will load it from the default extension repository, extensions.duckdb.org, at runtime.

Configuring

The second argument to DuckDBClient.of and DuckDBClient.sql is a DuckDBConfig object which configures the behavior of DuckDB-Wasm. By default, Framework sets the castBigIntToDouble and castTimestampToDate query options to true. To instead use BigInt:

const bigdb = DuckDBClient.of({}, {query: {castBigIntToDouble: false}});

By default, DuckDBClient.of and DuckDBClient.sql automatically load all configured extensions. To change the loaded extensions for a particular DuckDBClient, use the extensions config option. For example, pass an empty array to instantiate a DuckDBClient with no loaded extensions (even if your configuration lists several):

const simpledb = DuckDBClient.of({}, {extensions: []});

Alternatively, you can configure extensions to be self-hosted but not load by default using the duckdb config option and the load: false shorthand:

export default {
  duckdb: {
    extensions: {
      spatial: false,
      h3: false
    }
  }
};

You can then selectively load extensions as needed like so:

const geosql = DuckDBClient.sql({}, {extensions: ["spatial", "h3"]});

In the future, we’d like to allow DuckDB to be configured globally (beyond just extensions) via the duckdb config option; please upvote #1791 if you are interested in this feature.

Versioning

Framework currently uses DuckDB-Wasm 1.29.0 , which aligns with DuckDB 1.1.1. You can load a different version of DuckDB-Wasm by importing npm:@duckdb/duckdb-wasm directly, for example:

import * as duckdb from "npm:@duckdb/[email protected]";

However, you will not be able to change the version of DuckDB-Wasm used by SQL code blocks or the sql or DuckDBClient built-ins, nor can you use Framework’s support for self-hosting extensions with a different version of DuckDB-Wasm.