Public
Edited
Apr 23, 2023
Insert cell
md`# Ronan's second exercise on Observable: Classification and Colors`
Insert cell
d3 = require("d3@5")
Insert cell
topojson = require("topojson-client@3")
Insert cell
Georgia = FileAttachment("Georgia.json").json()
//importing the correct json file. In this instance it is the same file as in assignment one and follows the same procedure for uploading.
Insert cell
county_features = topojson.feature(Georgia, Georgia.objects.Georgia)
//assigning features to the county
Insert cell
csv_data = d3.csvParse(await FileAttachment("FoodAccessResearchAtlasData2019.csv").text(),({CensusTract, Pop2010, TractSeniors}) => [CensusTract, (+TractSeniors/+Pop2010)*100])
//creating a new normalized variable while reading the csv file
Insert cell
csv_data_objects = Object.assign((d3.csvParse(await FileAttachment("FoodAccessResearchAtlasData2019.csv").text(), d3.autoType)).map(({CensusTract, Pop2010, TractSeniors}) => ({CensusTract: +CensusTract, PCTSeniors:(+TractSeniors/+Pop2010)*100})))
//assigning the arrays to the objects
Insert cell
viewof bins = Inputs.range([0, 50], {step: 1, label: "Bins"})
//choosing the range of bins needed for our histogram. I chose 50 as the range since my dataset is large and this helps with visibility and analysis. Also my maximum value percentage is 47.25. We choose x amount because. This histogram shows the distribution of the percentage of senior citizens within each Census tract. It shows a relatively normal distribution peaking at approximately 12 percent which occurs in just over 300 Census tracts.
Insert cell
Plot.plot({
marks: [
Plot.rectY(csv_data_objects, Plot.binX({y: "count"}, {x: "PCTSeniors", thresholds: bins})),
Plot.ruleY([0])
]
})
//Code for the histogram
Insert cell
seniorpct = Array.from(csv_data.values(), d => d[1])
//assigning the new variable "seniorpct"
Insert cell
data = Object.assign(new Map(csv_data), {title: ["Percent Population of Georgia: Senior Citizens"]})
Insert cell
md`# Linear Scale (Unclassed)`
Insert cell
linear = d3.scaleLinear()
.domain(d3.extent(seniorpct))
.range(["#eff3ff", "#08519c"])
//Unclassed data follows a linear scale since it does not class data into different clusters. This means that every data point is assigned their corresponding value on a bipolar color scale. We therefore need the maximum and minimum value in our data set. I chose the sequential blues color scheme from colorbrewer.com. Below, the squares do not correctly represent the entire data set as I have far too many observations, however it would show a color development from HEX #eff3ff (nearly white) to #08519c (blue).
Insert cell
Insert cell
chart(numericSort(seniorpct), linear)
Insert cell
md`# Quantile Classification`
Insert cell
quantile = d3.scaleQuantile()
.domain(seniorpct)
.range(["#eff3ff", "#6baed6", "#08519c"])
// In quantile classification the data is divided equally so that each bin has the same amount of observations and then classed accordingly. Therefore, if I had exactly 100 observations, 25 would fall into the first class, 25 into the second etc. The color scheme is kept the same.
Insert cell
Insert cell
chart(numericSort(seniorpct), quantile)
Insert cell
md`# Jenks Natural Breaks Classification`
Insert cell
naturalbreaks = simple.ckmeans(seniorpct, 5).map(v => v.pop())
//creating a natural break. I chose 5 classes based on histogram I created further above and the color scheme I had chosen. I used the 5 class colorblind safe red color scheme. As show in the array above, the natural breaks occur at 6.47, 10.74, 15.04, 22.38 and 27.65 respectively.
Insert cell
jenks = d3
.scaleThreshold()
.domain(naturalbreaks)
.range(["#fee5d9", "#fcae91", "#fb6a4a", "#de2d26", "a50f15"])
//Natural breaks classification uses a "nearest neaighbour" classification system is which it groups the data points into a certain number of classes with the goal of representing each as accurately as possible, highlighting outliers. Its unique breaks makes it more dificult to compare between maps. Once again my squares are only showing values wihtin the first class.
Insert cell
chart(numericSort(seniorpct), jenks)
Insert cell
md`# Equal Interval Classification (Quantize)`
Insert cell
quantize = d3.scaleQuantize()
.domain([d3.min(seniorpct),d3.max(seniorpct)])
.range(["#fee5d9", "#fcae91", "#fb6a4a", "#de2d26", "a50f15"])
//This method classes data into classes whose breaks are equidistant to each other. Once I again I chose five classes.
Insert cell
chart(numericSort(seniorpct), quantize)
Insert cell
md`# Threshold`
Insert cell
Insert cell
chart(numericSort(seniorpct), threshold)
Insert cell
showScaleGrouping(seniorpct, {
scaleQuantile: quantile,
scaleThreshold: threshold,
scaleJenks: jenks,
scaleQuantize: quantize,
scaleQuantizeNice: quantize.copy().nice()
})
//The charts above show the distribution and color of each classification model nicely. Note how some have equally sized classes whereas others do not. Regarding the use of colors in Observable, I either input two colours in a range function and allowed Observable to create a continuous scale between the two colours or I set the discrete values for each class to a colour according to a sequential color scheme found on colorbrewer.org. If the code looks as below, the first HEX colour code from right to left refers the first class determined in the domain, the second to the second and so on.
//.domain([3, 10, 15, 20, 50])
//.range(["#eff3ff", "#bdd7e7", "#6baed6", "#3182bd", "08519c"])
Insert cell
md`# Annex`
Insert cell
Insert cell
Insert cell
simple = require("simple-statistics@7.0.7/dist/simple-statistics.min.js")
Insert cell
Insert cell

One platform to build and deploy the best data apps

Experiment and prototype by building visualizations in live JavaScript notebooks. Collaborate with your team and decide which concepts to build out.
Use Observable Framework to build data apps locally. Use data loaders to build in any language or library, including Python, SQL, and R.
Seamlessly deploy to Observable. Test before you ship, use automatic deploy-on-commit, and ensure your projects are always up-to-date.
Learn more