Published
Edited
Jun 11, 2020
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
// Load the data from GitHub
state_data = d3.csv(
"https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv"
)
Insert cell
// Display the table
render_data_table(state_data.slice(0, 10))
Insert cell
Insert cell
Insert cell
md`- There are ${state_data.length} observations in my data`
Insert cell
md`- Each observation is the number of total (cumulative) cases confirmed by that day`
Insert cell
md`## Data Formatting + Questions
Because there is plenty of available visualization code on the web, lots of the code you write will be data wrangling (reformatting, subsetting, renaming, etc.). One of the more challenging tasks is identifying the programming steps associated with a given question. Here are a few data oriented questions that you should try to answer (feel free to use pacakges such as [D3](https://d3js.org/) or [underscore](https://underscorejs.org/), both of which provide functionality for working with data)

**See code below for calculating these values**

1. How many observations are there for Washington state? (is it the same number as New York state, and if not, why....?)
- There are ${wa_data.length} observations in Washington, and ${
ny_data.length
} observations for New York (presumably because there were cases in WA before NY)
2. What is the total number of cases that have occurred in Washington state?
- ${total_wa} cases occured in Washington state
3. How many unique states are present in the dataset?
- There are ${
unique_states.length
} states in the dataset (includes U.S. territories such as Guam)
4. Which state has had the most (total) cases?
- The state with the most cases is ${state_most_cases}, which has had ${most_cases} confirmed cases
5. Which state has had the most (total) deaths?
- The state with the most deaths is ${most_deaths.state}, which has had ${
most_deaths.deaths
} deaths


---
Some tougher questions...
1. Create an JavaScript variable that contains the total number of deaths in each state (structure is up to you -- a few ways to do it, but an _array of objects_ -- one for each state -- is a good option)
- See below.
2. What range of dates does this dataset cover (proper date formatting can be tough...)?
- The data ranges from ${date_formatter(
date_range.first_date
)} to ${date_formatter(date_range.last_date)}
3. Which state has had the highest _average_ number of cases per day? (_hint: create an array (dataset) where each element is a state, and each state has an array, with each element in that array indicating the number of **new** cases that day_).
- The state with the highest number of cases per day is ${
highest_avg.state
}, with an average number of cases per day of ${Math.round(
highest_avg.avg_cases_per_day,
1
)} cases (since the first case)
`
Insert cell
Insert cell
// Filter down to just washington
wa_data = state_data.filter(d => d.state == "Washington")
Insert cell
// Filter down to just New York
ny_data = state_data.filter(d => d.state == "New York")
Insert cell
// Get the total in WA (which is the max value, as these are cumulative)
total_wa = d3.max(wa_data, d => +d.cases)
Insert cell
// Get a unique list of states
unique_states = {
const all_names = state_data.map(d => d.state);
const unique_states = _.uniq(all_names);
return unique_states;
}
Insert cell
// Get observations for each state, then find the "max" (which will be the total number in that state)
cases_by_state = unique_states.map(state_name => {
const state_obs = state_data.filter(d => d.state == state_name);
const max_in_state = d3.max(state_obs, d => +d.cases);
return { state: state_name, cases: max_in_state };
})
Insert cell
// Get the highest number of case
most_cases = d3.max(cases_by_state, d => d.cases)
Insert cell
// Get the name of the state with the most cases
state_most_cases = cases_by_state.filter(d => d.cases === most_cases)[0].state
Insert cell
// Calculating state w/most deaths
// Showing a different (simpler?) approach than the steps above for cases
most_deaths = state_data.filter(
d => +d.deaths === d3.max(state_data, dd => +dd.deaths)
)[0]
Insert cell
// Getting total deaths -- this isn't too hard, as the highest value is the cumulative number
total_deaths = unique_states.map(state_name => {
// Get observations for each state, then find the "max" (which will be the total number in that state)
const state_obs = state_data.filter(d => d.state == state_name);
const max_in_state = d3.max(state_obs, d => +d.deaths);
return { state: state_name, cases: max_in_state };
})
Insert cell
// Compute the date range
date_range = {
// Get all of the dates possible
const all_dates = _.uniq(state_data.map(d => d.date));

// Calculate max and min, converting each one to a proper date object (see date formatter below)
const parser = d3.timeParse("%Y-%m-%d");
const min_date = d3.min(all_dates, d => parser(d));
const max_date = d3.max(all_dates, d => parser(d));
return { first_date: min_date, last_date: max_date };
}
Insert cell
// To format the dates in the markdown above
date_formatter = d3.timeFormat("%B %d")
Insert cell
// Tough one here -- convert to cases per day by subtracting the previous day
avg_cases_per_day = unique_states.map(state_name => {
// Object to return
let obj = { state: state_name };
// Filter down the data to the current state, then construct the data object for this state
let this_state_data = state_data.filter(d => d.state == state_name);

const data = this_state_data.map((d, i) => {
// Calculate the *new* cases today by subracting the cases yesterday
// Note, this assumes that the data is ordered by date, which it is (should probably sort....)
let new_cases = 0;
if (i === 0) new_cases = +d.cases;
else new_cases = +d.cases - +this_state_data[i - 1].cases;
return { date: d.date, new_cases: new_cases };
});
// Compute average cases per day in the states' data
obj.avg_cases_per_day = d3.sum(data, d => d.new_cases) / data.length;
// obj.data = data; // uncomment this to attach the "data" to each state object
return obj;
})
Insert cell
// Get the highest average cases per day
highest_avg = avg_cases_per_day.filter(
state =>
state.avg_cases_per_day ===
d3.max(avg_cases_per_day, d => d.avg_cases_per_day)
)[0]
Insert cell
md`## Appendix`
Insert cell
_ = require("underscore")
Insert cell
d3 = require("d3")
Insert cell
Insert cell
import {
displayImage,
render_data_table,
table_styles
} from "@info474/utilities"
Insert cell
table_styles
Insert cell
html`<style>
p code, li code {color: #c30771;}
ul > li {
color:#164eb6;
}
</style>

`
Insert cell

One platform to build and deploy the best data apps

Experiment and prototype by building visualizations in live JavaScript notebooks. Collaborate with your team and decide which concepts to build out.
Use Observable Framework to build data apps locally. Use data loaders to build in any language or library, including Python, SQL, and R.
Seamlessly deploy to Observable. Test before you ship, use automatic deploy-on-commit, and ensure your projects are always up-to-date.
Learn more