Published
Edited
Mar 4, 2022
Insert cell
# Project 1
Insert cell
data = FileAttachment("scoresEdited@1.csv").csv();
Insert cell
data2 = FileAttachment("ScoresEdited2.csv").csv()
Insert cell
makeChart2(data2)
Insert cell
<div id="my_dataviz"></div>
Insert cell
makeChart2 = (dataset) => {
var width = 450;
var height = 450;
var margin = 40;

var radius = Math.min(width, height) / 2 - margin;

const chart2Svg = d3
.select("#my_dataviz")
.append("svg")
.attr("width", width)
.attr("height", height)
.append("g")
.attr("transform", `translate(${width / 2}, ${height / 2})`);

const data = { Manhattan: 89, Staten_Island: 10, Bronx: 98, Queens: 68, Brooklyn: 109 };

const color = d3.scaleOrdinal().range(d3.schemeSet2);

const pie = d3.pie().value(function (d) {
return d[1];
});
const data_ready = pie(Object.entries(data));

const arcGenerator = d3.arc().innerRadius(0).outerRadius(radius);

chart2Svg
.selectAll("mySlices")
.data(data_ready)
.join("path")
.attr("d", arcGenerator)
.attr("fill", function (d) {
return color(d.data[0]);
})
.attr("stroke", "black")
.style("stroke-width", "2px")
.style("opacity", 0.7);

chart2Svg
.selectAll("mySlices")
.data(data_ready)
.join("text")
.text(function (d) {
return d.data[0];
})
.attr("transform", function (d) {
return `translate(${arcGenerator.centroid(d)})`;
})
.style("text-anchor", "middle")
.style("font-size", 17);

return chart2Svg.node();
}
Insert cell
makeChart4(data2)
Insert cell
<div id="my_dataviz3"></div>
Insert cell
makeChart4 = (dataset) => {
var width = 450;
var height = 450;
var margin = 40;

var radius = Math.min(width, height) / 2 - margin;

const chart3Svg = d3
.select("#my_dataviz3")
.append("svg")
.attr("width", width)
.attr("height", height)
.append("g")
.attr("transform", `translate(${width / 2}, ${height / 2})`);

const data = { Manhattan: 60965, Staten_Island: 13815, Bronx: 81082, Queens: 37392, Brooklyn: 92425 };

const color = d3.scaleOrdinal().range(d3.schemeSet2);

const pie = d3.pie().value(function (d) {
return d[1];
});
const data_ready = pie(Object.entries(data));

const arcGenerator = d3.arc().innerRadius(0).outerRadius(radius);

chart3Svg
.selectAll("mySlices")
.data(data_ready)
.join("path")
.attr("d", arcGenerator)
.attr("fill", function (d) {
return color(d.data[0]);
})
.attr("stroke", "black")
.style("stroke-width", "2px")
.style("opacity", 0.7);

chart3Svg
.selectAll("mySlices")
.data(data_ready)
.join("text")
.text(function (d) {
return d.data[0];
})
.attr("transform", function (d) {
return `translate(${arcGenerator.centroid(d)})`;
})
.style("text-anchor", "middle")
.style("font-size", 17);

return chart3Svg.node();
}
Insert cell
Before we analyze this data set of results, let's check our data for huge flaws.

This graph shows the number of schools selected in different areas of New York, and we can see that the largest number of schools were selected in Brooklyn, and the smallest number of schools were selected in Staten Island. We need to validate this data to ensure that the number of students selected is proportional to the number of students selected in each area. In our perception, if the number of schools in a region is very small when the region has a large number of students, then there is an uneven distribution of educational resources.

So we drew the 2nd graph. The second graph shows the number of students in different areas of New York. Unlike the first graph, which shows the number of schools, the number in the second graph is the number of students. We can still see that Brooklyn has the largest number of students. This is in line with the previous graph, which shows that Brooklyn has the highest number of schools. This shows that there is no problem in selecting the data and analyzing it, and that there is no uneven distribution of educational resources as mentioned above.

Insert cell
<button id="button1">Manhattan</button>
<button id="button2">Staten_Island</button>
<button id="button3">Bronx</button>
<button id="button4">Queens</button>
<button id="button5">Brooklyn</button>
<div id="my_dataviz2"></div>
Insert cell
makeChart3(data)
Insert cell
makeChart3 = (dataset) => {
const data1 = [
{ group: "White", value: 8140 },
{ group: "Black", value: 22384 },
{ group: "Hispanic", value: 9320 },
{ group: "Asian", value: 9320 }
];

const data2 = [
{ group: "White", value: 1942 },
{ group: "Black", value: 2921 },
{ group: "Hispanic", value: 5511 },
{ group: "Asian", value: 3260 }
];

const data3 = [
{ group: "White", value: 17012 },
{ group: "Black", value: 19190 },
{ group: "Hispanic", value: 30877 },
{ group: "Asian", value: 12342 }
];

const data4 = [
{ group: "White", value: 2180 },
{ group: "Black", value: 9752 },
{ group: "Hispanic", value: 1427 },
{ group: "Asian", value: 3363 }
];

const data5 = [
{ group: "White", value: 9259 },
{ group: "Black", value: 27800 },
{ group: "Hispanic", value: 32934 },
{ group: "Asian", value: 20804 }
];

const margin = { top: 30, right: 30, bottom: 70, left: 60 },
width = 460 - margin.left - margin.right,
height = 400 - margin.top - margin.bottom;

const svg = d3
.select("#my_dataviz2")
.append("svg")
.attr("width", width + margin.left + margin.right)
.attr("height", height + margin.top + margin.bottom)
.append("g")
.attr("transform", `translate(${margin.left},${margin.top})`);

const x = d3
.scaleBand()
.range([0, width])
.domain(data1.map((d) => d.group))
.padding(0.2);
svg
.append("g")
.attr("transform", `translate(0,${height})`)
.call(d3.axisBottom(x));

const y = d3.scaleLinear().domain([0, 35000]).range([height, 0]);
svg.append("g").attr("class", "myYaxis").call(d3.axisLeft(y));

function update(data) {
var u = svg.selectAll("rect").data(data);

u.join("rect")
.transition()
.duration(1000)
.attr("x", (d) => x(d.group))
.attr("y", (d) => y(d.value))
.attr("width", x.bandwidth())
.attr("height", (d) => height - y(d.value))
.attr("fill", "#69b3a2");
}

update(data1);

document.getElementById("button1").onclick = function () {
update(data1);
};
document.getElementById("button2").onclick = function () {
update(data2);
};
document.getElementById("button3").onclick = function () {
update(data3);
};
document.getElementById("button4").onclick = function () {
update(data4);
};
document.getElementById("button5").onclick = function () {
update(data5);
};
return svg.node();
}
Insert cell
makeChart1(data)
Insert cell
makeChart1 = (dataset) => {
const yGap = 1;
const w = 1000;
const h = dataset.length * yGap;
const chart1Svg = d3.create("svg")
.attr('width', w)
.attr('height', h);

const xScale = d3
.scaleLinear()
.domain([800, 2400])
.rangeRound([0, w - 100]);
const yScale = d3
.scaleLinear()
.domain([100, 0])
.rangeRound([40, h - 40]);
const rScale = d3
.scaleSqrt()
.domain([d3.min(dataset, (d) => d.Student_Enrollment), d3.max(dataset, (d) => d.Student_Enrollment)])
.rangeRound([1, 1.01]);

// FILL IN HERE, and observe changes below

chart1Svg
.selectAll("circle")
.data(dataset)
.join("circle")
.attr("fill","#a3a4a6")
.attr("cx", (d) => xScale(d.SAT_Total))
.attr("cy",(d) => yScale(d.Percent_Tested*100))
.attr("r", (d) => rScale(d.Student_Enrollment));
chart1Svg
.selectAll("text")
.data(dataset)
.join("text")
.text(d => d.app_name)
.attr("x", (d) => xScale(d.SAT_Total))
.attr("y",(d) => yScale(d.Percent_Tested*100))
.attr("font-family", "Arial")
.attr("fill","#f8e561");
chart1Svg
.append("g")
.attr("class", "axis")
.attr("transform", `translate(40, ${h - 20})`)
.call(d3.axisBottom(xScale));
chart1Svg
.append("g")
.attr("class", "axis")
.attr("transform", `translate(40,20)`)
.call(d3.axisLeft(yScale).tickSize(0))
.call(d3.axisLeft(yScale).tickSize(0).tickFormat(d => d + "%"));;
return chart1Svg.node();
}
Insert cell
FileAttachment("image.png").image()
Insert cell
After this we are going to start an analysis of SAT scores. I will compare the average school score and the percentage of schools conducting the test as two reference systems. This comparison is interesting and he has to be discussed in context.

The SAT is not a mandatory test for colleges in the United States. In China, there is a test called the "college entrance exam" that is mandatory for every high school student. For any student in China who wishes to go to college, a score on the college entrance exam is required. Excluding special circumstances, the percentage of students taking the college entrance exam in any high school in China is almost always above 90%. However, unlike colleges in the United States, this test score is not a requirement for colleges in the United States, so students can choose to take the test or not. Generally speaking, if a student is confident in his or her test scores, he or she may choose to take the test and get a satisfactory score. We have made a speculation based on the above common sense.
For U.S. high schools, the higher the school's average test score, the higher the percentage of students taking the test.
So we selected these two parameters for comparison.

We can see scattered points plotted on the statistical graph. We can see that the exam scores are generally distributed around 1200 points. And the full score of this exam is 2400. This indicates that the vast majority of students could have scored half as well on this exam. We can still see that some schools have scored over 2000 points, which is actually very scary.
To test the hypothesis, we started to analyze the data.

First of all, we can see that a large number of schools tested students in the middle of the range of 50% to 70%. Their SAT scores are irregularly distributed. There are some schools that have over 90% of their students taking the SAT and achieving scores close to 2000. At the same time, there are schools where less than 30% of the students take the test and achieve lower scores. But these two are only a few cases, and we have no way to compare them by this kind of information area.

So I used minitab to do a linear regression analysis.

When we use only one x to predict y, it is a univariate linear regression, which means that we are finding a straight line to fit the data. For example, if I have a scatter plot drawn from a set of data, with the horizontal coordinate representing the amount of advertising input and the vertical coordinate representing the sales volume, linear regression is finding a straight line and making that line fit the data points in the plot as closely as possible.

Here are the results of my analysis.

The graph shows that there is a positive correlation between the percentage of schools taking the test and the average score on the test. But the correlation is not significant.

Insert cell
<h1><a href="https://observablehq.com/@frankluo426/readme">READ ME</a></h1>
Insert cell

One platform to build and deploy the best data apps

Experiment and prototype by building visualizations in live JavaScript notebooks. Collaborate with your team and decide which concepts to build out.
Use Observable Framework to build data apps locally. Use data loaders to build in any language or library, including Python, SQL, and R.
Seamlessly deploy to Observable. Test before you ship, use automatic deploy-on-commit, and ensure your projects are always up-to-date.
Learn more