Data Wrangling Assessment / Interactive Info Vis

UW iSchool Course INFO 474: focused on designing and building visualizations to better understand and communicate about pressing issues.

Workspace

Published

Edited

Jan 11, 2021

Fork of Data Wrangling Assessment

// Load the protests.csv data (a File Attachment) into a variable

protests = FileAttachment("protests.csv").csv()

// How many protests are in the dataset?

num_protests = protests.length

// How much information is available about each protest?

num_features = Object.keys(protests[0]).length

// Create an array of the number of attendees in each protest

// (make sure to return a *number* for each element in the array)

num_attendees = protests.map(d => +d.Attendees)

// What is the lowest number of attendees?

min_attendees = d3.min(num_attendees)

// What is the highest number of attendees?

max_attendees = d3.max(num_attendees)

// What is the mean number of attendees?

mean_attendees = d3.mean(num_attendees)

// What is the median number of attendees?

median_attendees = d3.median(num_attendees)

// What is the absolute difference between the mean and median number of attendees?

mean_median_diff = mean_attendees - median_attendees

// Create an array that holds the Location of each protest

locations = protests.map(d => d.Location)

// How many *unique* locations are in the dataset?

num_locations = _.uniqBy(protests, d => d.Location).length

// How many protests occured in Washington?

// (hint: locations that end with "WA")

num_in_wa = protests.filter(d => d.Location.endsWith("WA")).length

// What proportion of protests (number / total) occured in Washington?

prop_in_wa = num_in_wa / protests.length

// How many protests occurred in each state?

// hint: use the d3.rollups method to create an *array of arrays*,

// where each one holds the state abbreviation and the number of protests in that state

// https://observablehq.com/@d3/d3-group

num_by_state = d3.rollups(protests, v => v.length, d => d.Location.slice(-2))

// What was the highest number of protests that occured in a single state?

highest_in_a_state = d3.max(num_by_state, d => d[1])

// Which state (two letter abbreviation) had the highest number of protests?

// hint: use the two variables calculated above

state_most_protests = num_by_state.filter(

d => d[1] === highest_in_a_state

)[0][0]

// Create an array that holds the Date of each protest

// You should convert the string in the original dataset to a proper date object using the Date() function

dates = protests.map(d => new Date(d.Date))

// What is the most recent date in the dataset?

most_recent = d3.max(dates)

// What is the earliest date in the dataset?

earliest = d3.min(dates)

// How many protests occurred in 2020?

// hint: use the getFullYear() method

num_in_2020 = dates.map(d => d.getFullYear()).filter(d => d === 2020).length

// How many protests occurred in 2019?

num_in_2019 = dates.map(d => d.getFullYear()).filter(d => d === 2019).length

// How many protests occurred in July (of any year)?

// Hint: you can use date.toLocaleDateString("default", { month: "long" }) to get the month name

num_in_july = dates

.map(d => d.toLocaleDateString("default", { month: "long" }))

.filter(d => d === "July").length

// How many protest occured each month?

// hint: use the d3.rollups method to create an *array of arrays*,

// where each one holds the month name and the number of protests in that month

by_month = d3.rollups(

dates,

v => v.length,

d => d.toLocaleDateString("default", { month: "long" })

)

// Which month had the highest number of protests?

month_most_protests = by_month.filter(

d => d[1] === d3.max(by_month.map(d => d[1]))

)[0][0]

// How many different purposes are listed in the dataset?

// (the number of unique values of the `Event (legacy; see tags)` attribute)

num_purposes = _.uniqBy(protests, d => d["Event (legacy; see tags)"]).length

// That's quite a few -- if you look at the array, you'll notice

// a common pattern for each purpose. It's listed as:

// SOME_PURPOSE (additiona_detail)

// To get a higher level summary, create an array of `high_level_purposes` by

// extracting *everything before the first parenthesis* in each purpose ("Event (legacy; see tags)")

// You should return an array equal to the length of the protests dataset

high_level_purposes = protests.map(

d => d["Event (legacy; see tags)"].split("(")[0]

)

// Which high-level purpose was the most common?

// Hint, you may want to open a JavaScript statement to write multiple lines of code,

// Or try out d3.greatest with d3.rollups

// https://observablehq.com/@d3/d3-least

most_common_purpose = d3.greatest(

d3.rollups(high_level_purposes, v => v.length, d => d)

)[0]

import { check_answer, displayCaution } with {

answers

} from "@uw-info474/utilities"

answers = await FileAttachment("answers.json").json({ typed: true })

new Date(answers.earliest)

d3 = require("d3")

_ = require("lodash")

REPLACE_ME = undefined

Purpose-built for displays of data

Observable is your go-to platform for exploring data and creating expressive data visualizations. Use reactive JavaScript notebooks for prototyping and a collaborative canvas for visual data exploration and dashboard creation.

Learn more