Bug illustration / Benjamin Schmidt

Benjamin Schmidt

Digital Humanist. Manhattan-based.

Workspace

Public

Edited

Jan 2, 2023

Fork of Trends in History dissertations / statistics of distinguishing words as SQL

chart = {

// vals; // Don't do this until the better query is done.

// Get the counts by year for the word we're interested in.

let v = await client.query(

WITH

word_counts AS

(SELECT year AS year, word, COUNT(*)::FLOAT count FROM

(SELECT year, UNLIST(string_split_regex(LOWER(Dissertation), '[ ,\-]')) word FROM data) t1

GROUP BY ALL

year_totals AS (SELECT year, SUM(count) as total FROM word_counts WHERE year > 1920 GROUP BY year),

word_count AS (SELECT * FROM word_counts WHERE word = '${word}')

SELECT year, word, COALESCE(count, 0) AS count, total, 100 * COALESCE(count, 0)::FLOAT/total::FLOAT AS rate FROM year_totals LEFT JOIN word_count USING ("year")

ORDER BY year`

);

v = v.map((v) => ({ ...v }));

return Plot.plot({

marks: [Plot.line(v, { x: "year", y: "rate" })]

});

}

client.query(

"SELECT Dissertation, year FROM data WHERE Dissertation LIKE '%hair%' AND year > 2010 LIMIT 10"

)

Type JavaScript, then Shift-Enter. Ctrl-space for more options. Arrow ↑/↓ to switch modes.

DuckDBClient

client = DuckDBClient.of({ data: FileAttachment("all-dissertations@2.parquet") })

Purpose-built for displays of data

Observable is your go-to platform for exploring data and creating expressive data visualizations. Use reactive JavaScript notebooks for prototyping and a collaborative canvas for visual data exploration and dashboard creation.

Learn more