DuckDB
sql
tagged template literal. Use DuckDBClient
or DuckDB-Wasm directly, as shown here, if you need greater control.DuckDB is “an in-process SQL OLAP Database Management System. DuckDB-Wasm brings DuckDB to every browser thanks to WebAssembly.” DuckDB-Wasm is available by default as duckdb
in Markdown, but you can explicitly import it as:
import * as duckdb from "npm:@duckdb/duckdb-wasm";
For convenience, we provide a DatabaseClient
implementation on top of DuckDB-Wasm, DuckDBClient
. This is also available by default in Markdown, but you can explicitly import it like so:
import {DuckDBClient} from "npm:@observablehq/duckdb";
To get a DuckDB client, pass zero or more named tables to DuckDBClient.of
. Each table can be expressed as a FileAttachment
, Arquero table, Arrow table, an array of objects, or a promise to the same. For file attachments, the following formats are supported: CSV, TSV, JSON, Apache Arrow, and Apache Parquet. For example, below we load a sample of 250,000 stars from the Gaia Star Catalog as a Parquet file:
const db = DuckDBClient.of({gaia: FileAttachment("gaia-sample.parquet")});
Now we can run a query using db.sql
to bin the stars by right ascension (ra
) and declination (dec
):
const bins = db.sql`SELECT
floor(ra / 2) * 2 + 1 AS ra,
floor(dec / 2) * 2 + 1 AS dec,
count() AS count
FROM
gaia
GROUP BY
1,
2`
These bins can quickly be turned into a heatmap with Plot’s raster mark, showing the milky way.
Plot.plot({
aspectRatio: 1,
x: {domain: [0, 360]},
y: {domain: [-90, 90]},
marks: [
Plot.frame({fill: 0}),
Plot.raster(bins, {
x: "ra",
y: "dec",
fill: "count",
width: 360 / 2,
height: 180 / 2,
imageRendering: "pixelated"
})
]
})
You can also attach a complete database saved as DuckDB file, typically using the .db
file extension (or .ddb
or .duckdb
). In this case, the associated name (below base
) is a schema name rather than a table name.
const db2 = await DuckDBClient.of({base: FileAttachment("quakes.db")});
db2.queryRow(`SELECT COUNT() FROM base.events`)
For externally-hosted data, you can create an empty DuckDBClient
and load a table from a SQL query, say using read_parquet
or read_csv
. DuckDB offers many affordances to make this easier. (In many cases it detects the file format and uses the correct loader automatically.)
const db = await DuckDBClient.of();
await db.sql`CREATE TABLE addresses
AS SELECT *
FROM read_parquet('https://static.data.gouv.fr/resources/bureaux-de-vote-et-adresses-de-leurs-electeurs/20230626-135723/table-adresses-reu.parquet')
LIMIT 100`;
As an alternative to db.sql
, there’s also db.query
:
db.query("SELECT * FROM gaia LIMIT 10")
db.sql
and db.query
methods return a promise to an Arrow table. This columnar representation is much more efficient than an array-of-objects. You can inspect the contents of an Arrow table using Inputs.table
and pass the data to Plot.And db.queryRow
:
db.queryRow("SELECT count() AS count FROM gaia")
See the DatabaseClient Specification for more details on these methods.
Finally, the DuckDBClient.sql
method takes the same arguments as DuckDBClient.of
and returns the corresponding db.sql
tagged template literal. The returned function can be used to redefine the built-in sql
tagged template literal and thereby change the database used by SQL code blocks, allowing you to query dynamically-registered tables (unlike the sql front matter option).
const sql = DuckDBClient.sql({quakes: `https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/${feed}_day.csv`});
SELECT * FROM quakes ORDER BY updated DESC;
Extensions
DuckDB extensions extend DuckDB’s functionality, adding support for additional file formats, new types, and domain-specific functions. For example, the json
extension provides a read_json
method for reading JSON files:
SELECT bbox FROM read_json('https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson');
To read a local file (or data loader), use FileAttachment
and interpolation ${…}
:
SELECT bbox FROM read_json(${FileAttachment("../quakes.json").href});
For convenience, Framework configures the json
and parquet
extensions by default. Some other core extensions also autoload, meaning that you don’t need to explicitly enable them; however, Framework will only self-host extensions if you explicitly configure them, and therefore we recommend that you always use the duckdb config option to configure DuckDB extensions. Any configured extensions will be automatically installed and loaded, making them available in SQL code blocks as well as the sql
and DuckDBClient
built-ins.
For example, to configure the spatial
extension:
export default {
duckdb: {
extensions: ["spatial"]
}
};
You can then use the ST_Area
function to compute the area of a polygon:
SELECT ST_Area('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::GEOMETRY) as area;
To tell which extensions have been loaded, you can run the following query:
FROM duckdb_extensions() WHERE loaded;
If the duckdb_extensions()
function runs before DuckDB autoloads a core extension (such as json
), it might not be included in the returned set.
Self-hosting of extensions
As with npm imports, configured DuckDB extensions are self-hosted, improving performance, stability, & security, and allowing you to develop offline. Extensions are downloaded to the DuckDB cache folder, which lives in .observablehq/
within the source root (typically src
). You can clear the cache and restart the preview server to re-fetch the latest versions of any DuckDB extensions. If you use an autoloading core extension that is not configured, DuckDB-Wasm will load it from the default extension repository, extensions.duckdb.org
, at runtime.
Configuring
The second argument to DuckDBClient.of
and DuckDBClient.sql
is a DuckDBConfig
object which configures the behavior of DuckDB-Wasm. By default, Framework sets the castBigIntToDouble
and castTimestampToDate
query options to true. To instead use BigInt
:
const bigdb = DuckDBClient.of({}, {query: {castBigIntToDouble: false}});
By default, DuckDBClient.of
and DuckDBClient.sql
automatically load all configured extensions. To change the loaded extensions for a particular DuckDBClient
, use the extensions config option. For example, pass an empty array to instantiate a DuckDBClient with no loaded extensions (even if your configuration lists several):
const simpledb = DuckDBClient.of({}, {extensions: []});
Alternatively, you can configure extensions to be self-hosted but not load by default using the duckdb config option and the load: false
shorthand:
export default {
duckdb: {
extensions: {
spatial: false,
h3: false
}
}
};
You can then selectively load extensions as needed like so:
const geosql = DuckDBClient.sql({}, {extensions: ["spatial", "h3"]});
In the future, we’d like to allow DuckDB to be configured globally (beyond just extensions) via the duckdb config option; please upvote #1791 if you are interested in this feature.
Versioning
Framework currently uses DuckDB-Wasm 1.29.0 , which aligns with DuckDB 1.1.1. You can load a different version of DuckDB-Wasm by importing npm:@duckdb/duckdb-wasm
directly, for example:
import * as duckdb from "npm:@duckdb/[email protected]";
However, you will not be able to change the version of DuckDB-Wasm used by SQL code blocks or the sql
or DuckDBClient
built-ins, nor can you use Framework’s support for self-hosting extensions with a different version of DuckDB-Wasm.