Published
Edited
Mar 4, 2022
Insert cell
# README
When I first get the data set, I need to process the data set in order for my program to display the graph of the data set. At the beginning I had to process each name, because names with spaces are not acceptable to the program. For this reason I replaced all the spaces with underscores. While the names were being processed, I needed to clean the entire table of data. This cleanup was not really difficult, as I only needed to delete the blank information. After all, I didn't want to have a school with a blank grade.

After cleaning the table it needs to be processed because the table does not necessarily provide all the data that is desired. This table provides the math reading and writing scores, but not the overall scores. It seemed to me that the overall scores would be more informative than the three separate scores for the three different subjects, so I added all the scores together. This way I could get the total score that I needed for the data analysis and the percentage of the test that was included in the table itself.

Beyond that, I needed some more information to show that my sample was adequate. On top of this I used the contents of the first table to plot the data in the second table, where I counted the number of schools in different regions, and the distribution of ethnicity in different regions. Such basic information on the distribution of people can somehow indicate that our sample size is sufficient for our data analysis. The sample is not one-sided or wrong.

In my graphing, I chose to use sector and bar charts to represent the population because this type of data is more about the distribution and the relative highs and lows, and not about the exact number of different ethnic groups in each school.

For data involving two subjects of study, the percentage of people participating in the exam, and the total score of the exam, I chose to use a dot plot for the description. In statistics, we use dot plots and linear regressions to show the relationship between two numbers, and using dot plots gives the user a clearer visualization of where the distribution is.

Linear regression is a way to analyze the relationship between two sets of data in statistics. It describes the correlation between two data sets and analyzes whether they are positively or negatively correlated. This is a common method of data analysis, and I chose to use this method after researching a lot of information on the internet.
Insert cell

One platform to build and deploy the best data apps

Experiment and prototype by building visualizations in live JavaScript notebooks. Collaborate with your team and decide which concepts to build out.
Use Observable Framework to build data apps locally. Use data loaders to build in any language or library, including Python, SQL, and R.
Seamlessly deploy to Observable. Test before you ship, use automatic deploy-on-commit, and ensure your projects are always up-to-date.
Learn more