Published
Edited
Sep 28, 2019
Fork of Untitled
Insert cell
md`The data set that we have chosen is the atomic bombs that has ever been dropped. This data set contains tests as well as bombs used in warfare. This dataset was found on Github and it was created by Thomas Mock.`
Insert cell
d3 = require('d3@5')
Insert cell
d3.csv("https://raw.githubusercontent.com/zavenanarsh/tidytuesday/master/data/2019/2019-08-20/nuclear_explosions.csv")
Insert cell
md`The first step is to load in the data and having it return the desired attributes with the most simplified naming. We have chosen the following attributes. Considering that the upper and lower yield is the same for almost all the data, we have considered to find the mean yield and use that for this assignment. We have also converted the data into numbers for the appropriate types using +d.`
Insert cell
atomicBombs = d3.csv("https://raw.githubusercontent.com/zavenanarsh/tidytuesday/master/data/2019/2019-08-20/nuclear_explosions.csv", function(d) {
return {
date : d.date_long,
year : +d.year,
yearString : d.year,
id : +d.id_no,
country : d.country,
name : d.name,
type : d.type,
latitude : +d.latitude,
longitude : +d.longitude,
yield : ((+d.yield_lower) + (+d.yield_upper)) /2
};
})
Insert cell
md`1) This finds the maximum and minimum of the yield of he atomic bombs. The maximum yield is 50,000 Kilotonnes of TNT and the minimum yield is 0 which means that the bomb was either too small being lower than 1 kilotonnes of explosives which is alot for a non nuclear bomb or a failed test. We did not know which one it represented so we decided to not eliminate this data. This is also explored in the "count how many records match a particular dimension criterion" section of this notebook.`
Insert cell
maxYield=d3.max(atomicBombs, d => d.yield)
Insert cell
minYield=d3.min(atomicBombs, d => d.yield)
Insert cell
md`After figuring out which is the larges bomb, we were curious and wondered which country developed the largest bomb so we decided to graph it in a bar chart to compare the results. This representation clearly conveys it. In addition, we have created a bar chart which showcases the minimum yield of the bombs.`
Insert cell
{
let yieldByCountry = d3.nest()
.key(d => d.country)
.rollup(v => ({
min: d3.min(v, d => d.yield),
max: d3.max(v, d => d.yield),
mean: d3.mean(v, d => d.yield),
}))
.entries(atomicBombs);
return vl.markBar()
.data(yieldByCountry.map(d => {
return {
country: d.key,
max: d.value.max,
}
}))
.encode(
vl.x().fieldN('country').sort(vl.fieldQ('max').order('descending')),
vl.y().fieldQ('max')
)
.render()
}

Insert cell
{
let yieldByCountry = d3.nest()
.key(d => d.country)
.rollup(v => ({
min: d3.min(v, d => d.yield),
max: d3.max(v, d => d.yield),
mean: d3.mean(v, d => d.yield),
}))
.entries(atomicBombs);
return vl.markBar()
.data(yieldByCountry.map(d => {
return {
country: d.key,
min: d.value.min,
}
}))
.encode(
vl.x().fieldN('country').sort(vl.fieldQ('min').order('descending')),
vl.y().fieldQ('min')
)
.render()
}
Insert cell
md `2) For sum, we have done a count which sums all the atomic bombs that each country possesses and graphed it in a visual graph. In addition we have summed up the total yield per country. The following graph showcases that USA has the highest number of atomic bombs. Through this bar graph the the first bar graph, it can be inferred that on average, the bombs that the USSR have is more devastating since they have a higher total yield.`
Insert cell
md `In order for these graphs to be graphed, we first created arrays and selected the necessary attributes that are needed to create the graph. We have used the length to find the total count and sum to find the sum.`
Insert cell
countByCountry = d3.nest()
.key(d => d.country)
.rollup(v => v.length)
.entries(atomicBombs);
Insert cell
vl.markBar()
.data(countByCountry.map(d => {
return {
country: d.key,
count: d.value,
}
}))
.encode(
vl.x().fieldN('country').sort(vl.fieldQ('count').order('descending')),
vl.y().fieldQ('count')
)
.render()
Insert cell
yieldByCountry = d3.nest()
.key(d => d.country)
.rollup(v => d3.sum(v, d => d.yield))
.entries(atomicBombs);
Insert cell
vl.markBar()
.data(yieldByCountry.map(d => {
return {
country: d.key,
yield: d.value,
}
}))
.encode(
vl.x().fieldN('country').sort(vl.fieldQ('yield').order('descending')),
vl.y().fieldQ('yield')
)
.render()
Insert cell
md`3) We have found the mean and median values for the yield and and the year for the bombs launched. These data will also be indirectly displayed in the following histograms.`
Insert cell
yieldMedian = d3.median(atomicBombs, d => d.yield)
Insert cell
yieldMean = d3.mean(atomicBombs, d => d.yield)
Insert cell
md` Considering that the Median and the Mean are different, the data is skewed. Furthermore, considering that the mean is greater than the median, the data is positively skewed.`
Insert cell
yearMedian = d3.median(atomicBombs, d => d.year)
Insert cell
yearMean = d3.mean(atomicBombs, d => d.year)
Insert cell
md` Unlike the yield, the median and the mean for the year is quite similar so it is safe assume that the data is fairly unskewed or lightly positively skewed.`
Insert cell
md`4) We decided to count how many records match a particular dimension criterion for the yield and the number of bombs for a year.`
Insert cell
md`The code below calculates the total number of atomic bombs that has a yield greater than 10,000 kilotonnes of TNT.`
Insert cell
bombsOver10000 = atomicBombs.reduce((count, d) => {
if (d.yield > 10000) {
count += 1;
}
return count;
}, 0);
Insert cell
md`The code below calculates the total number of atomic bombs that was exploded in 1980.`
Insert cell
bombsIn1980 = atomicBombs.reduce((count, d) => {
if (d.year == 1980) {
count += 1;
}
return count;
}, 0);
Insert cell
md`The code below calculates the total number of atomic bombs that failed resulting in a yield of 0 or has a yield smaller than 1 kilotonnes of TNT.`
Insert cell
failedBombs = atomicBombs.reduce((count, d) => {
if (d.yield == 0) {
count += 1;
}
return count;
}, 0);
Insert cell
md`6) We have created two distinct histograms, one for the count of records of atomic bombs exploded in intervals of 10 or 5 years and one on the yield of the bombs over the years. `
Insert cell
import {vl} from '@vega/vega-lite-api'
Insert cell
vl.markBar()
.data(atomicBombs)
.encode(
vl.x().fieldQ('year').bin({step:10}),
vl.y().count()
)
.render()
Insert cell
md`The histograms bin size 10 and 5 looks different, but the overall shape is similar.`
Insert cell
vl.markBar()
.data(atomicBombs)
.encode(
vl.x().fieldQ('year').bin({step:5}),
vl.y().count()
)
.render()
Insert cell
md`This histogram shows the sums of the yields of bombs for every 5 years. This data makes sense as that time was the pinnacle of the cold war.`
Insert cell
vl.markBar()
.data(atomicBombs)
.encode(
vl.x().fieldQ('year').bin({step:5}),
vl.y().sum('yield')
)
.render()
Insert cell
VegaLite = require("vega-embed@5")
Insert cell
md`In addition to the required content, we have tried to create a representation of yield through a scatter plot trying to see if the year influences the yield of the bombs. The problems with this graphs is that the dots are too small resulting in difficulty of differentiating which country was responsible for the bomb. In addition, the majority of the data were closely packed in towards the bottom section of the graph making it difficult to count the number of bombs.`
Insert cell
VegaLite({
data: {values: atomicBombs},
width: 800,
height: 600,
mark: "circle",
encoding: {
x: {timeUnit: "year", field: "yearString", type: "ordinal"},
y: {field: "yield", type: "quantitative"},
color: {field: "country", type: "nominal"}
}
})
Insert cell
md`We have tried another form of representation, the stacked area chart. The chart below shows the number of bombs per year and per country. We have refered to this link "https://vega.github.io/vega-lite/docs/area.html#stacked-area-chart".`
Insert cell
VegaLite({
data: {values: atomicBombs},
width: 500,
height: 500,
mark: "area",
encoding: {
x: {
field: "year",
type: "quantitative",
},
y: {
aggregate: "count",
field: "yield",
type: "quantitative"
},
color: {
field: "country",
type: "nominal",
scale: {
scheme: "set2"
}
}
}
})
Insert cell
md`One challenge that we have faced is understanding how the dot notation works for vega lite. We looked at the vega library and the sample codes are all in JSON notation. Through many attemps of guessing, we have figured out how to sort the data from ascending rather than alphabetically.
`
Insert cell
md`Another challenge was learning how to share code through Observable. Considering that the team option is not free, that was out of the question. We figured out that there is a way to fork the notebook, track who has forked your notebook, and fork the most recent one. This allowed us to fork eachother's notebook easily.`
Insert cell

One platform to build and deploy the best data apps

Experiment and prototype by building visualizations in live JavaScript notebooks. Collaborate with your team and decide which concepts to build out.
Use Observable Framework to build data apps locally. Use data loaders to build in any language or library, including Python, SQL, and R.
Seamlessly deploy to Observable. Test before you ship, use automatic deploy-on-commit, and ensure your projects are always up-to-date.
Learn more