Course Project - Report / Daniel B. Papp

Daniel B. Papp

Part time Full Stack Engineer and full time dreamer

Workspace

Public

Course Project

Edited

Apr 26, 2023

## Design Decisions

As my goal with the visualization dashboard was to compare the speed of Formula 1 cars across years, it made sense to use the x-axis of my, at this point, just a theoretical idiom and the y-axis as the speed of the race cars. I soon realized that I could not compare all years between 1950 and 2023 since there have been an overwhelming amount of regulatory changes, technological advancements, and variants of circuits. Due to this constraint, I had to filter my entire dataset based on several parameters to ensure limited data sparsity. After some exploratory data analysis, I came up with the parameters to be a year interval that the races had to be within and a set of tracks that held a race for most of the selected years. Following this process, I ended up with 12 tracks and 19 years to work with.

After the entire data-wrangling process, I was still unsure how to best visualize the data and convince the viewer of a set goal for the dashboard. To get some idea of how other analysts are visualizing such data, I headed to my favorite search engine and found a Kaggle notebook [2] that featured the same dataset I was using but with a slight difference in the approach and objective. From the notebook, I could quickly identify how I would carry out such an idiom and the data points that went into designing it. I opted to use the same facetted type of scatterplot that the notebook has, but with a small subset of the data to make the dashboard more user-friendly. This visualization follows the detailed dataset component design pattern for the data information. It shows a general overview of the data subset based on three significant values (speed, year, and track name), and it also follows the detailed chart visualization for the visual representation pattern. For the composition design pattern, I ended up using the detail-on-demand pattern, as the dashboard is highly user interaction driven. Users can select constructors on any chart, highlight marks, and decrease the visibility of non-important data points.

When it came to adding more charts to the dashboard, I wanted to identify any significant pattern in when constructors choose to strategize their fastest lap attempts to secure constructors' championship points. As described in the Visualizations section, I now realize that using an idiom other than a scatterplot would have been more beneficial. Implementing an idiom such as a box plot to measure attempts within a range of laps would've made the idiom less busy but it could have decreased the consistency throughout the dashboard as it would have been hard to categorize the box plot's data points via the constructor's name. An additional way to visualize the given data could have been a facetted histogram that compares the fastest lap attempts across laps in a race for all of the target tracks. The reason why I opted not to do something like the Kaggle notebook [2] because it would have increased the height of the dashboard significantly and there wasn't a way for me to layer the histogram over the first visualization as one was using the years as it's x-axis and the other was using the lap numbers as it's x-axis.

Proceeding further, the bottom part of the dashboard was intended to gain an insight into what constructors go through in a season of races. To start off, I thought out the data that I knew I was going to need to compare constructor performance through a given season. When it came to the idiom to choose, I dipped into yet another Kaggle notebook [3] where the author does a great job at showing how the Turkish GP was won by Lewis Hamilton. A multi-line chart was perfect for this visualization as there was two ordinal data point, rounds of the season, and the current standing of the constructor in the season table. While this could have been enough to convey some message, I wanted the user to see how the two separate parts of the dashboard were connected through the constructor's name being used to encode the color channel.

At this point, I was happy with what was being shown already, but there was no way for the viewer to identify the driving factors that went into achieving the final standing at the end of the season. It is well known that winning a race gets the constructor the most points, but teams have two drivers in each race giving them a huge opportunity to grab points from unimaginable places. To shed some extra light on the positions achieved by each team's driver, I horizontally concatenated a scatterplot with the line chart and added a regression line. The point of adding the regression line to the idiom was to show how the drivers are supposed to perform race after race to keep up with the other drivers. Marks placed above the regression line are generally considered good finishes comparatively to the rest of the drivers. As expected, this idiom also uses the color channel to group marks by the constructor and decreases the opacity of unselected marks.

To finish off the dashboard and give viewers more clarity on the constructor's championship battle, I used a bar chart idiom to show the difference between the points achieved by each constructor in a given season. Visualizing the accumulated points by each team puts into perspective how one-sided some of the seasons were in Formula 1. For example in the 2004 season, Ferrari dominated the season by getting almost double the points that the second-position team, BAR, accumulated.

## Development Process

Include a commentary on the development process, including answers to the following questions:

Roughly how much time did you spend developing your visualization?

What aspects took the most time?

Purpose-built for displays of data

Observable is your go-to platform for exploring data and creating expressive data visualizations. Use reactive JavaScript notebooks for prototyping and a collaborative canvas for visual data exploration and dashboard creation.

Learn more