Observable transforms exploratory data analysis for data science students

Data Educator Stories explores the practices and projects of educators who use Observable in the classroom. The goal of this series is to grow our collective knowledge and build connections between the creative work of individuals and the broader education community.

The first discussion in our series is a chat with Philip Bogden, Observable Ambassador and computer science professor at Northeastern University. He’s currently using Observable in several graduate-level data science courses, including Introduction to Programming for Data Science, which prepares students with non-STEM backgrounds to complete a Masters in Data Science, and an advanced course called Cutting Edge Data Visualization with Web Technologies, which teaches students how to develop interactive visualizations with D3, Plot, and other technologies.

How do you use Observable in the classroom? Is there one course in particular where you find it valuable?

All of my courses involve projects where students develop practical applications for various stakeholders. The disciplines range from geophysics to healthcare and they all need data visualization of some kind. Interactive visualizations that invite users to explore their data are the game changers, and many involve maps. Custom interactive visualizations are hard to create in Python and R, which are the standard languages for data science these days. And that’s where Observable comes in. It’s easier than ever to create innovative visualizations with Observable. In the old days (several years ago), it was hard to justify using JavaScript to teach data science, especially when students were still on the steep part of the learning curve with Python or R. But not anymore. Observable is proving transformative for teaching at all levels, and it’s getting better all the time, especially for student projects.

"Observable is transformative for the kind of exploratory data analysis that I liken to pulling a rabbit out of a hat. It’s most fun when students discover something new in data that they thought they already understood." - Philip Bogden, Associate Teaching Professor of Computer Science, Northeastern University

What kind of data do your students work with in Observable? Where does it come from and how does it help you teach?

Projects involve all kinds of data from AI research to business applications – but it’s not just about data. There are three requirements for a student project: data, a story, and a stakeholder. Stakeholders are important because they help define and refine the story based on what the students find in the data. Students with full-time jobs or internships often bring their own data, in which case the stakeholder might be an executive in the company where they work. This is all part of Northeastern’s experiential learning model.

Students use visualizations to communicate the information in the data to their stakeholders and Observable enters the picture at various stages. For students who know some JavaScript, Observable is great for exploratory analysis. For those students who only know Python or R, I get them interested in interactive visualizations using Observable’s amazing embedding API.

For example, students in one class created an interactive web app that displays citizen reports of toxic smells in their neighborhoods. The stakeholder for the “stinky” project is a local nonprofit that didn’t have the resources to create that kind of application. Students were able to prototype in Observable and turn it into an embeddable app. It’s a fully interactive map displaying data filtered by time-series animations built with D3 brush. Ultimately, it invites non-technical users to explore the data that they help generate. Other regional nonprofits are approaching us with similar project ideas.

Cells from an Observable notebook were embedded and are accessed by a wide range of stakeholders.

What is something only Observable can do that you find critical to your teaching?

Observable is central in my advanced dataviz class where students learn to visualize with D3. That’s a no-brainer. What’s relatively novel is how I incorporate Observable in my intro data science class, where students learn to code in Python and don’t have experience with web technologies. In that class, Observable is transformative for the kind of exploratory data analysis that I liken to pulling a rabbit out of a hat. It’s most fun when students discover something new in data that they thought they already understood.

For example, my favorite dataset is the real-time earthquake feed from the USGS Earthquake Hazards Program. We’ve all seen earthquake data, and there are cool examples all over the Observable Community, but there are ways to explore this dataset that are full of surprises, if you use the right visualization.

This example demonstrates a small number of ways to explore the earthquake dataset. View the complete notebook for more.

Python is so strong for AI and ML that it’s a core language for data science. Python dataviz libraries often use JavaScript under the hood, and many of those use D3. If you give students a Python library for dataviz that’s built with D3 and you don’t show them how it works, then you’re giving them a fish. I prefer to teach them how to fish. So when they get excited by interactive data visualization, I introduce them to Observable so they can learn to do it themselves.

In my intro to data science class, I start with Mike Bostock’s World Airports notebook. In particular, I show students how to use Python to replace the airports with earthquakes. Granted, it’s easy if you’re working in an Observable notebook, but we’re using Python in Jupyter notebooks. A super lightweight Observable-Jupyter package connects Python to JavaScript using the Observable API. Here’s the Python code in a Colab notebook. No sophisticated geospatial Python modules are necessary, and it’s not a black box. It’s all done with built-in Python data structures, so it’s great for my intro programming course. When students see how it works, it sparks their interest in Observable and that demo is just the start.

The data really comes to life in extraordinary ways with interactive visualizations – the kind you can only create with Observable. The earthquake dataset is rich with scientifically interesting things that most people have never seen before. For example, I created the visualization below with Observable Plot and D3 brush.

Alaska’s earthquake data offers new insights when incidents are mapped by depth.

You can use this visualization to explore Alaska’s earthquakes. With only one month of data, you can actually see the effects of actively colliding tectonic plates. The quakes occur where the Pacific plate migrates north and plunges underneath Alaska. Since earthquakes get deeper as you go north along the interface between the Pacific and North American plates, the surface quakes look entirely different. It’s hard to explain in words, but the animation brings it to life. I showed this visualization to geophysicists at the USGS Alaska Volcano Observatory recently who had never seen the data visualized that way before, and they were impressed.

"The data really comes to life in extraordinary ways with interactive visualizations – the kind you can only create with Observable." - Philip Bogden, Associate Teaching Professor of Computer Science, Northeastern University

Can you share a bit about your background in data?

I was on the faculty at Yale and UConn when I was doing the tenure-track thing as an oceanographer. I left academia for a while to run a non-profit that provided real-time environmental data as a public service, then I moved to the NSF as a program officer in the Office of Cyber Infrastructure. At NSF, the goal was to combine computer science with other scientific disciplines in order to solve grand challenge problems. The NumPy homepage as some examples, including the first image of a black hole.

I got interested in D3 back in the early days, before Mike Bostock was at the New York Times. I remember going to a talk at a GIS conference (I'm an open source GIS evangelist too, BTW) where I saw the speaker present an incredible interactive application. I was amazed. I went up and I asked the speaker, “How did you do that?” and he said, “D3.” My response at the time was, “What's D3?” I started using D3 in my own work and my teaching. At that time, D3 was relatively hard to use for teaching. When Observable came out, Mike amazed me yet again. It’s transformative for education, for reasons described in Nature article that came out a couple years ago.

Now I’m teaching full time in the graduate computer science program at Northeastern, specializing in data visualization, data science, web technologies, and AI/ML. I wasn’t really looking for a full-time position but it's actually a win-win because Northeastern is not a traditional program. It’s a very unique experiential learning program, and we get to do a lot of neat projects with local stakeholders.

What are you working on now? What’s next?

I’m creating more case studies that I can integrate into the data-science classroom. I'm looking for scientifically compelling jaw droppers to give to students who are learning Python. The Alaska quakes visualization is an example. These visualizations use interactions and animations that are hard to achieve natively in Python or R, but relatively easy in Observable. They spark the curiosity of students who head over to Observable to see how it’s done.

Several new technologies are helping in my search. Observable-Jupyter is one. It allows me to quickly create interactive visualizations in Observable and give them to students who are learning data science in Jupyter notebooks.

Observable Plot is another; Plot.geo in particular. I'm anxious to see what Mike Bostock and Fil Rivière come up with next.

I don't think you'll see JavaScript replace Python in the data science classroom anytime soon. But I do think you'll see more data scientists getting excited about learning enough JavaScript to create their own innovative visualizations in Observable. Those students will be on the cutting edge.

Observable transforms exploratory data analysis for data science students

Data Educator Stories, featuring Philip Bogden, PhD, Associate Computer Science Professor at Northeastern University

Courtney Francis

Related posts

Reshaping data for visualizations with D3 and Observable Plot

Big data, fast data visualizations

How data analysts can improve data culture across their organization

Get started today