Exploratory Data Analysis / Spring 2020 Info Vis

Spring 2020 Info Vis

UW iSchool Course INFO 474: focused on designing and building visualizations to better understand and communicate about pressing issues.

Workspace

Published

Edited

May 26, 2020

1 fork

Importers

md`## Histograms, what are they good for?

Our dataset had a number of scalar variables which we wanted to analyze so we made a modular histogram graph to visualize all of them. The y axis measures the number of songs which had column's x axis value. Some notable observations are as follows.

* Year: As one would expect more recent songs are highly overrepresented within spotify's most popular songs with the most recent year being over 8 times more frequent than the next highest and only four years in total containing more than the mean number of songs. The youngest song was from 1942, Bing Crosby's "White Christmas".

* BPM: BPM values were fairly evely distributed with a slightly negative relationship between higher BPM's and their frequency in songs. Some of the most common BPM's were around the 90-100 mark.

* NRGY: NRGY is a metric of "energecticness" a song contains and followed a left skewed normal distribution with a mean of around 55;

* DNCE: Dancibility followed a similar left skewed normal distribution to NRG but with a clustering of upper 1/3 quantile songs at the 30 dance value level.

* DB: Db or loudness follows a similar left skewed normal distribution to NRGY and DNCE. Unlike the previous two scalar variables there are no clusterings within the normal distribution. This in turn means that the quantiles are grouped together with the most popular upper 1/3 quantile grouped in the center, the middle third quantile beside it and the lower quantile on the tails.

* Live: The measure of live preformance LIVE is a heavily right skewed distribution with a mean of values far to the right of almost all the upper one third quantile variables. Very few songs have a high live value.

* Val: Valence is the measure of how lively or positive a song was and there is not clear trend within the distribution. The data is ever so slightly left skewed with a few long valence songs from 10-30 valence value bringing down the average.

* Dur: Dur is a normal distribution and right skewed by a few songs around the 350 dur value and one large set of outliers around the 450 mark.

* Acous: Acous is acousticness of a song and is right skewed with a large clustering around the 0-20 values.

* SPCH: SPCH is the amount of speaking done within a song and is heavily right skewed. The vast majority of songs have 0 to 15 speechiness while the tail end of the distribution reaches almost 60 with a few very high value songs from 30 to 60 bringing up the average to 25.

* Pop: Pop is the popularity value of each song. Pop follows a left skewed normal distribution. There is a valley of uncommon popularity values around 55-65.

top50countryRaw = FileAttachment("top50contry.csv").text()

top50countryUnformatted = d3.csvParse(top50countryRaw)

top50country = _.cloneDeep(top50countryUnformatted).map(element => {

element.year = parseInt(element.year);

element.bpm = parseInt(element.bpm);

//element.year = parseInt(element.year);

element.nrgy = parseInt(element.nrgy);

element.dnce = parseInt(element.dnce);

element.dB = parseInt(element.dB);

element.live = parseInt(element.live);

element.val = parseInt(element.val);

element.dur = parseInt(element.dur);

element.acous = parseInt(element.acous);

element.spch = parseInt(element.spch);

element.pop = parseInt(element.pop);

return element;

})

d3 = require("d3@5")

_ = require("lodash@4")

import { select, radio, checkbox } from "@jashkenas/inputs"

import { swatches } from "@d3/color-legend"

height = 500

margin = ({ top: 20, right: 30, bottom: 40, left: 50 })

marginAlt = ({ top: 40, right: 40, bottom: 40, left: 50 })

Purpose-built for displays of data

Observable is your go-to platform for exploring data and creating expressive data visualizations. Use reactive JavaScript notebooks for prototyping and a collaborative canvas for visual data exploration and dashboard creation.

Learn more