Published
Edited
Apr 15, 2021
Insert cell
md`# CSE 412 Assignment 2
Student: Yuchen Sun
Due: Apr 19
`
Insert cell
md`### Part 1: Data Selection

For this assignment, I will explore the climate change dataset from [link](https://raw.githubusercontent.com/ZeningQu/World-Bank-Data-by-Indicators/master/climate-change/climate-change.csv).

This dataset contains 52 time series data. One of the variable is \`Country Code\`, so I know the dataset contains data from multiple countries. In this exploration, I will only use the data collected from the **United States** to answer the following questions:
1. How does emission change over time?
2. How does agricultural land as a % of total land change over time?
3. How does fishing, forestry and agriculture as a % of GDP change over time?

After some exploratory analysis of the data, I want to answer more questions on the relationship between multiple variables.
`
Insert cell
md`### Part 2: Exploratory Visual Analysis
`
Insert cell
import {aq, op} from '@uwdata/arquero'
Insert cell
vg = require('vega-lite')
Insert cell
data_structure = await aq.loadCSV('https://raw.githubusercontent.com/YuchenS1/CSE412/main/A2/datastr.csv?token=AIK5BGABLL5PELNOB7SUMEDAQGY6G')
Insert cell
data_structure.view()
Insert cell
md`

Q. How does emission change over time?

<figure>
<img src="${await FileAttachment("pic5.png").url()}" style="background: #d5d5d5; width: auto; height: auto; max-height: calc(0.7 * (100vw - 28px));">
<figcaption>A dot plot of the CO2 emissions over the years show three missing values on the latest years recorded at 0. The missing values are highlighted in red. They will be removed when answering the question.</figcaption>
</figure>
`
Insert cell
md`
Q. How does agricultural land as a % of total land change over time?

<figure>
<img src="${await FileAttachment("pic1.png").url()}" style="background: #d5d5d5; width: auto; height: auto; max-height: calc(0.7 * (100vw - 28px));">
<figcaption>A dot plot of the agricultural land as % of total land over the years show 6 extreme values possibly outliers. All 6 points are highglighted in red, with the bottom three clearly miss recorded because they are at 0%. The top three are also suspicious because they are out of the ordinary constant decreasing trend of the data.</figcaption>
</figure>
`
Insert cell
md`
Q. How does fishing, forestry and agriculture as a % of GDP change over time?

<figure>
<img src="${await FileAttachment("pic2.png").url()}" style="background: #d5d5d5; width: auto; height: auto; max-height: calc(0.7 * (100vw - 28px));">
<figcaption>A dot plot of the contribution of agriculture, forestry and fishing to GDP shows many missing data prior to the 2000s. These missing data significantly reduces the value of the question. Therefore, I will avoid the use of GDP data from this dataset and ignore this question.</figcaption>
</figure>
`
Insert cell
md`
When answering the question how emissions change over time, the dot plot shows a generally increasing total emissions over the years. This could be due to a growth in population. So, to further explore the relationship between total population and emissions, I want to first look at the population data by plotting it along time.

<figure>
<img src="${await FileAttachment("pic6.png").url()}" style="background: #d5d5d5; width: auto; height: auto; max-height: calc(0.7 * (100vw - 28px));">
<figcaption>This plot shows there are huge spikes in population data through out the years. These spikes must be outliers because population cannot suddenly increase and decrease like that.</figcaption>
</figure>
`
Insert cell
md`After cleaning up the outliers from population data and emissions data, I align both line plots by their x axis which denotes the year.

<figure>
<img src="${await FileAttachment("pic7.png").url()}" style="background: #d5d5d5; width: auto; height: auto; max-height: calc(0.7 * (100vw - 28px));">
</figure>
`
Insert cell
md`It seems like the total CO2 Emissions didn't grow as smoothly as the total population over the years. This shows that there are some policies in play to restrict CO2 Emissions, which started as soon as the 1970s shown by the first dip. Looking at the most recent years, the plot shows that the CO2 emissions growth has reached a plateu. This could be due to the global combined effort to reduce CO2 emissions from cars and power plants such as the EPA’s 2015 “Carbon Pollution Standard for New Power Plants”.`
Insert cell
md`It is also interesting to look at the decrease in agricultural land as a % of total land and how this relates to the urban population growth. Urban population growth is expected to triple from 2000 to 2030. The plot below shows urban population as a % of total population over the years.

<figure>
<img src="${await FileAttachment("pic8.png").url()}" style="background: #d5d5d5; width: auto; height: auto; max-height: calc(0.7 * (100vw - 28px));">
<figcaption>This plot shows several % points over 100%. Since this is not a plot showing % growth, those points over 100% are definitely outliers to be removed.</figcaption>
</figure>
`
Insert cell
md`Again, I align two line graphs along the same x-axis that denotes the year. There is a clear decreasing trend in agricultural land % and a clear increase in the urban population %. Seems like more people are moving into urban cities and away from farms in the agricutural lands.
<figure>
<img src="${await FileAttachment("pic10.png").url()}" style="background: #d5d5d5; width: auto; height: auto; max-height: calc(0.7 * (100vw - 28px));">
</figure>
`
Insert cell
md`From two graphs above, we see the connection between population increase and total emissions increase over the years. We also see that the decrease in agricutural land relate to the growth of urban population. We may also guess that urban population causes the most emissions when compared to farmers on their agricutural lands. Urban populations invest in the economy to build factories and drive cars to burn fossil fuel. Urban populations have the money that drives emission. In this dataset, we also can take a look at how foreign investment relate to emissions. Foreign investment is another source of funding to construct large power plants and factories that are huge players in CO2 emissions.
<figure>
<img src="${await FileAttachment("pic11.png").url()}" style="background: #d5d5d5; width: auto; height: auto; max-height: calc(0.7 * (100vw - 28px));">
</figure>

Looking at the year 2000 for example, we see that huge increase in foreign investment. Assuming this is not an outlier, we see a corresponding increase in CO2 emissions. Similar spikes in CO2 emission happen around the other years where there are heavy foreign investments inflow to the United States.
`
Insert cell
md`### Insights

Through this exploration, I first find population to have a positive relationship with total emissions. Then, I hypothesize that if we subset the population into urban and rural population, urban population will have a much heavier weight in producing emissions. However, as I was exploring the structure of the data, I realized the rural population data contain only 3 non-zero data points, which makes any further exploration meaningless, so I tried looking at how agricutural land % decrease over the years can show a clear trend in increasing urban population and decreasing rural population. When this trend is identified, I further hypothesize that urban population produces more emissions because they drive cars and fund factories or power plants that causes heavy CO2 emissions. But since the dataset doesn't contain any urban population investment information, I took the foreign investment inflow as a substitute and showed that increase in investment does relate to an increase in CO2 emissions.

My exploration suggests that CO2 emissions is linked to the economy and linked to urbanization. I originally hoped to compare data from different countries especially between developed and developing countries. However, because of the vast amount of missing data for developing countries, I was forced to limit my exploration to the country of the United States. This exploration made me realize that because developing countries had a much shorter history of recording statistics, many time series data are missing for the developing countries making restrospective comparison studies difficult to conduct.

`
Insert cell

Purpose-built for displays of data

Observable is your go-to platform for exploring data and creating expressive data visualizations. Use reactive JavaScript notebooks for prototyping and a collaborative canvas for visual data exploration and dashboard creation.
Learn more