Ronan's second exercise on Observable: Classification and Colors / Ronan Kleu

Ronan Kleu

Workspace

Public

Edited

Apr 23, 2023

Fork of Second Exercise on Observable: Classification and Colors

md`# Ronan's second exercise on Observable: Classification and Colors`

d3 = require("d3@5")

topojson = require("topojson-client@3")

Georgia = FileAttachment("Georgia.json").json()

//importing the correct json file. In this instance it is the same file as in assignment one and follows the same procedure for uploading.

county_features = topojson.feature(Georgia, Georgia.objects.Georgia)

//assigning features to the county

csv_data = d3.csvParse(await FileAttachment("FoodAccessResearchAtlasData2019.csv").text(),({CensusTract, Pop2010, TractSeniors}) => [CensusTract, (+TractSeniors/+Pop2010)*100])

//creating a new normalized variable while reading the csv file

csv_data_objects = Object.assign((d3.csvParse(await FileAttachment("FoodAccessResearchAtlasData2019.csv").text(), d3.autoType)).map(({CensusTract, Pop2010, TractSeniors}) => ({CensusTract: +CensusTract, PCTSeniors:(+TractSeniors/+Pop2010)*100})))

//assigning the arrays to the objects

viewof bins = Inputs.range([0, 50], {step: 1, label: "Bins"})

//choosing the range of bins needed for our histogram. I chose 50 as the range since my dataset is large and this helps with visibility and analysis. Also my maximum value percentage is 47.25. We choose x amount because. This histogram shows the distribution of the percentage of senior citizens within each Census tract. It shows a relatively normal distribution peaking at approximately 12 percent which occurs in just over 300 Census tracts.

Plot.plot({

marks: [

Plot.rectY(csv_data_objects, Plot.binX({y: "count"}, {x: "PCTSeniors", thresholds: bins})),

Plot.ruleY([0])

]

})

//Code for the histogram

seniorpct = Array.from(csv_data.values(), d => d[1])

//assigning the new variable "seniorpct"

data = Object.assign(new Map(csv_data), {title: ["Percent Population of Georgia: Senior Citizens"]})

md`# Linear Scale (Unclassed)`

linear = d3.scaleLinear()

.domain(d3.extent(seniorpct))

.range(["#eff3ff", "#08519c"])

//Unclassed data follows a linear scale since it does not class data into different clusters. This means that every data point is assigned their corresponding value on a bipolar color scale. We therefore need the maximum and minimum value in our data set. I chose the sequential blues color scheme from colorbrewer.com. Below, the squares do not correctly represent the entire data set as I have far too many observations, however it would show a color development from HEX #eff3ff (nearly white) to #08519c (blue).

chart(numericSort(seniorpct), linear)

md`# Quantile Classification`

quantile = d3.scaleQuantile()

.domain(seniorpct)

.range(["#eff3ff", "#6baed6", "#08519c"])

// In quantile classification the data is divided equally so that each bin has the same amount of observations and then classed accordingly. Therefore, if I had exactly 100 observations, 25 would fall into the first class, 25 into the second etc. The color scheme is kept the same.

chart(numericSort(seniorpct), quantile)

md`# Jenks Natural Breaks Classification`

naturalbreaks = simple.ckmeans(seniorpct, 5).map(v => v.pop())

//creating a natural break. I chose 5 classes based on histogram I created further above and the color scheme I had chosen. I used the 5 class colorblind safe red color scheme. As show in the array above, the natural breaks occur at 6.47, 10.74, 15.04, 22.38 and 27.65 respectively.

jenks = d3

.scaleThreshold()

.domain(naturalbreaks)

.range(["#fee5d9", "#fcae91", "#fb6a4a", "#de2d26", "a50f15"])

//Natural breaks classification uses a "nearest neaighbour" classification system is which it groups the data points into a certain number of classes with the goal of representing each as accurately as possible, highlighting outliers. Its unique breaks makes it more dificult to compare between maps. Once again my squares are only showing values wihtin the first class.

chart(numericSort(seniorpct), jenks)

md`# Equal Interval Classification (Quantize)`

quantize = d3.scaleQuantize()

.domain([d3.min(seniorpct),d3.max(seniorpct)])

.range(["#fee5d9", "#fcae91", "#fb6a4a", "#de2d26", "a50f15"])

//This method classes data into classes whose breaks are equidistant to each other. Once I again I chose five classes.

chart(numericSort(seniorpct), quantize)

md`# Threshold`

chart(numericSort(seniorpct), threshold)

showScaleGrouping(seniorpct, {

scaleQuantile: quantile,

scaleThreshold: threshold,

scaleJenks: jenks,

scaleQuantize: quantize,

scaleQuantizeNice: quantize.copy().nice()

})

//The charts above show the distribution and color of each classification model nicely. Note how some have equally sized classes whereas others do not. Regarding the use of colors in Observable, I either input two colours in a range function and allowed Observable to create a continuous scale between the two colours or I set the discrete values for each class to a colour according to a sequential color scheme found on colorbrewer.org. If the code looks as below, the first HEX colour code from right to left refers the first class determined in the domain, the second to the second and so on.

//.domain([3, 10, 15, 20, 50])

//.range(["#eff3ff", "#bdd7e7", "#6baed6", "#3182bd", "08519c"])

md`# Annex`

simple = require("simple-statistics@7.0.7/dist/simple-statistics.min.js")

One platform to build and deploy the best data apps

Experiment and prototype by building visualizations in live JavaScript notebooks. Collaborate with your team and decide which concepts to build out.

Use Observable Framework to build data apps locally. Use data loaders to build in any language or library, including Python, SQL, and R.

Seamlessly deploy to Observable. Test before you ship, use automatic deploy-on-commit, and ensure your projects are always up-to-date.

Learn more