Published
Edited
Mar 1, 2021
1 fork
Insert cell
md`# D3 Basic Bar Chart

Normally in an Observable notebook, we would layout our items with the chart displayed up top, any functions or calculations under it, and utilities like importing d3 down at the bottom. Here, so that you can get a sense of how to build a project, I'm going to walk through the steps in the order that they're usually written. Observable also isn't a peer-reviewed venue, so the standard layout is just a typical practice rather than a hard rule. For your final project, if you choose to work on Observable and it makes more sense to you to walk through things in the order that you build them, that's ok.

The first thing we do when starting a project is import d3 (or other libraries, like [Vega Lite](https://observablehq.com/@uwdata/introduction-to-vega-lite)). We import d3 like this, where @6 specifies the version of d3. If you google around for help or look at examples of other d3 Observable notebooks, check what version number the example is using--d3 version 4 is very different from v6, and what works in older versions won't always work in v6!
`
Insert cell
d3 = require('d3@6')
Insert cell
md`## Set utilities

Next, we specifiy some utility variables for height and margins. These variables work the same as height and margin in css, because we're going to use d3 to manipulate the css of our Observable page. Notice that the margin variable is formatted like we would format margin in css. We don't usually need to define width on Observable, because Observable has a default width variable that we can call that will always be 100% of the width of the page (which means that our width will always display nicely on differen device sizes! This is called responsive design.) If we wanted to specify a different, specific width like we're doing with height, we could just define our own width variable in the same way.`
Insert cell
height = 500
Insert cell
margin = ({
top: 10,
right: 10,
bottom: 40,
left: 35
})
Insert cell
md`## Get data

Now we'll import our data by attaching the file and telling Observable to interpret it as a csv. Try removing .csv() from FileAttachment below--instead of an array of objects with keys and values (column names and cell values), Observable can only 'see' that there's a file there, not what's in it.`
Insert cell
data = FileAttachment("1915State.csv").csv()
Insert cell
md`Sometimes it's helpful to see our data laid out more neatly. This isn't necessary, but it's helpful if we want to see our data laid out neatly or let our reader see our data. There's no built-in table maker in d3 or Observable, so we'll import a table helper function from someone who's already written one. (Try clicking through to [@observablehq/inputs](https://observablehq.com/@observablehq/input-table) to see what else you can do with this table helper!) Observable writes a lot of helper functions, so it's worth checking if someone has already written a helper for something you want to do before writing it from scratch.`
Insert cell
import { Table } from "@observablehq/inputs"
Insert cell
md`Once we've imported the table helper, we can make a nice table of our own. Checkout [@observablehq/inputs](https://observablehq.com/@observablehq/input-table) and see if you can sort this table by birth_year.`
Insert cell
Table(data)
Insert cell
md`Notice that our column names are all lowercase with underscores instead of spaces (ie, birth_year instead of Birth Year). This makes life easier when working in d3 because if your column names have no spaces, you can reference them like d.birth_year. If they have spaces, you have to reference them like d["Birth Year"] which can be a hassle in some places.

D3 is best as a presentation tool rather than an analysis tool, so it's best to go into a D3 visualization with a strong idea of what your final product will look like, rather than exploring your data. (I often do my exploration work in Tableau and then build in D3 if I want to do something Tableau can't do, like make specific interactions).

In this example we're going to visualize a count of individuals by birth year in the 1915 NY state census of Albany. Generally when working in D3, it makes life easier if your data is already in the 'shape' that you want to visualize it. That means that often when we're working with individual-level data (like the individual people in a census), it's helpful to sum or aggregate the items we're interested in using Python or another program before bringing it into D3. We can aggregate data using D3, but it's often a pain in the neck.`
Insert cell
md`## Aggregate the data

To make our data easier to work with, we're going to make a summary dataset--we're basically going to summarize only the information we're interested in, and only look at that instead of the *data* variable that contains all of our individual-level data. Below, we're using the [d3.rollup function](https://observablehq.com/@d3/d3-group) to create a summary variable *birth_years*. Because we're only interesting in counting the number of people born in each year we create the variable *birth_years* and use the d3.rollup function to look at the input *data*, and count it by length of each item in the birth_year column of our dataset. (Each item being in this case a year. Think of item as a category--what's the number or length of each item in the category.)`
Insert cell
birth_years = d3.rollup(data, v => v.length, d => d.birth_year)
Insert cell
md`In our aggregate function below, we're "flattening" the nested objects that d3.rollup gives us. The ***birth_years*** and ***aggregate*** variables include the same data but they're formatted differently. Below, the Array.from() function takes the input *birth_years*, and we tell it to reformat the objects [year, count] to {year, count}. This doesn't look like a big visual difference, but it's necessary to basically make a summary spreadsheet of our data. If we had created a summary spreadsheet in another program before bringing our data to D3, we wouldn't need to do this step.`
Insert cell
aggregate = Array.from(birth_years, ([year, count]) => ({ year, count }))
Insert cell
Table(aggregate, {
sort: "year"
})
Insert cell
md`## Formatting

Now that we have our data formatted nicely, we're going to define or describe what our x and y axises look like. Think of this as telling Observable how big our canvas is before we start drawing on it. First, we'll figure out the minimum and maximum extent of each axis. We could do this two ways (remember, there are often more than one way to solve any given problem.)

For our first option, we could get individual variables for the min and max, using d3.min() and d3.max() like we did in the Intro to Observable assignment. This works great for numeric data, but won't work for text data.

For our second option, we could get all possible entries in the column we're looking at and sort it how we want before building our axis. Our year data is numbers, but we could do the following if we were, for example, counting individuals by occupation instead of birth_year. Below, \`\`aggregate.map(d => +d.year)\`\` takes the aggregate data variable we made, and uses .map() to just pull out unique values for year. If we were using text categorical data, we might not need to sort this new array; we could just leave it in the order it is in our dataset. However, sometimes it might make sense to order our x-axis alphabetically or by some other ordering device. Here, since we're working with numbers, .sort() defaults to ordering our items in ascending order (from smallest to largest).`
Insert cell
xDomain = aggregate.map(d => +d.year).sort()
Insert cell
md`### Formatting the X Axis

Below in the variable *xScale* we're going to define our x-axis, or the marks along the bottom of the chart. [d3.scaleBand()](https://observablehq.com/@d3/d3-scaleband) is a function that gives us methods for handling categorical data, which is handy for bar charts. Here we're going to use it to define a variable *xScale* that defines our x-axis. Domain describes the extent of the data in your dataset, while range is the "physical" space on the screen that your data is displayed on.`
Insert cell
xScale = d3
.scaleBand()
.domain(xDomain) // the extent of our data to be displayed. You could enter this manually like [1821, 1915]
// where the two numbers are the minimum and maximum extent of your data, but hard-coding
// your numbers like that makes it harder to reuse your code in the future. If you used
// d3.min() and d3.max() to create min and max variables, you could also do [min, max] here
.range([margin.left, width - margin.right - margin.left]) // draw the x axis starting at the pixel defined
// defined by margin.left, and draw the x axis // across the screen until it arrives at a pixel
// just left of the right margin.
.padding(0.5) // give each bar a pad between itself, the bars next to it, and the right and left
// sides of the chart. .padding() is on a scale of 0 to 1, with no space between bars at 0
// and only space shown with 1
Insert cell
md `Once we have the x-axis defined, we tell d3 how to draw it. Here, [d3.axisBottom()](https://github.com/d3/d3-axis) says we want to draw the axis we just defined at the bottom of our chart.`
Insert cell
xAxis = d3.axisBottom(xScale)
Insert cell
md`### Formatting the Y Axis

Now that we have our x-axis defined, we'll define our y-axis (the vertical one). First we get the maximum extent of our count, so that d3 knows how tall to draw our y-axis. The input (aggregate, d => d.count) tells the .max() function to look in the aggregate array, and then look for objects with the name count and find the maximum of those numbers. (If we had an array of just numbers, we could input only the name of the array, but our aggregate array includes a lot of stuff so we have to specify what we want the max() function to look at).`
Insert cell
yMax = d3.max(aggregate, d => d.count)
Insert cell
md `Our *yScale* variable looks and works much like our *xScale* variable. The only change here is that because the data we'll be charting on our y axis is a continuous count of numbers, we're going to use .scaleLinear() instead of .scaleBand().`
Insert cell
yScale = d3
.scaleLinear() // use a linear count of numbers to draw the y-axis
.domain([0, yMax]) // the numbers will count starting at 0 and extending to the largest number
// in our count column of the aggregate array
.range([height - margin.bottom, margin.top]) // the "physical" space of the canvase to be drawn on,
// starting up a few pixels from the bottom and extending
// up to the margin
Insert cell
md `Likewise, our *yAxis* variable looks familiar. There's a new addition here with the [.tickSizeOuter(0) modifier](https://github.com/d3/d3-axis#axis_tickSizeOuter), which is an optional modifier that styles the y-axis. Try removing or commenting out just .tickSizeOuter() or changing the number to see what it does.`
Insert cell
yAxis = d3.axisLeft(yScale).tickSizeOuter(0)
Insert cell
md`## Drawing the Chart

Finally! Now that we have all our pieces defined, we can draw our chart.

D3 uses a special kind of html object called an SVG, or scaleable vector graphic element, to draw charts. SVGs can be an image file format like a .png or .jpg, or they can be drawn on a webpage. Whether a file format or drawn using a program like we're about to do, an SVG is defined mathematically rather than with colored pixels. This is especially handy for datavisualization because it means that if we were to come back and write a zoom function for our chart, we could zoom in infinitely without losing detail, because the chart can be mathematically redrawn on every zoom, rather than relying on the information already contained in static pixels.

Below, the function to draw the chart looks complicated, but it has only 5 steps:
1. select the "canvas" on the page to draw the chart
2. describe how to draw the bars of the chart
3. describe how to draw the x axis on the bottom
4. describe how to draw the y axis on the left side
5. allow d3 to actually draw the whole chart

Notice that we use variables below as inputs to draw our chart. This helps make our code easier to read and easier to change if we decide later we need to adjust something.`
Insert cell
bars = {
// step 1
const svg = d3.select(DOM.svg(width, height)); // select the document (DOM = the webpage or document) and
// draw onto that document an svg canvas the height and width defined by our variables

// step 2
svg
.append('g')
.selectAll('rect') // use svg rect objects to draw our bars
.data(aggregate) // use our aggregate array as the data to display
.join('rect') // get all the rect objects
.attr('class', 'bars') // give the objects we're about to draw the "bars" css class
.attr('x', d => xScale(d.year)) // use the year column of aggregate to mark the x axis
.attr('y', d => yScale(d.count)) // use the count column to mark the y axis
.attr('width', xScale.bandwidth()) // .bandwidth() is a special function of scaleBand
// it returns the width of the band (bar) based on the xScale variable configuration
// we set up earlier
.attr('height', d => yScale(0) - yScale(d.count)) // remember that yScale(0) is the height of the
// entire chart so we subtract the y position of the top of the
// bar yScale(d.count) from it to get the total height of the bar.
.style('fill', 'steelblue'); // finally fill in each bar with the steelblue html color. Fill can also take
// an rgba() or #hex value color like css. Try changing this to another color

// step 3
// Here we render the x axis. Try commenting out this section down to the step 4 below to see what it does
svg
.append('g')
.attr('class', 'x-axis') // give the x-axis class the css class x-axis
.attr('transform', `translate(0,${height - margin.bottom})`) // set the svg's position
// to the bottom of the chart
.call(xAxis) // then just call this to render it
.selectAll("text") // select the text of our x-axis
.attr("x", 9) // set the text 9
.attr("dy", ".35em") // adjust the left-right orientation of the text labels
// to line them up with the tick marks
.attr("transform", "rotate(90)") // rotate the text 90 degrees.
// Try changing 90 to another number between 0 and 360
.style("text-anchor", "start"); // align the text labels at the start of the svg.
// Try changing "start" to "middle" or "end" and see how the x-axis labels change

//step 4
// it works the same for the y axis. Less here complicated becase we have fewer changes to make from the default.
svg
.append('g')
// .attr('class', 'y-axis') // give the y-axis the y-axis class
.attr('transform', `translate(${margin.left},0)`) // set the position relative to the left margin
.call(yAxis); // draw the axis

//step 5
return svg.node(); // after performing all the operations above, return tells the bars variable that
// it can draw the chart defined by svg
}
Insert cell
md`The easiest way to learn something is often to fiddle with it. Try forking this notebook and chart number of individuals per birth_place instead of birth_year. By forking, you won't need to re-write all the code, but you will need to do a careful read for where the code refers to year and change that to your relevant birth_place data. Hint: start with the *aggregate* variable and check what + is doing in the xDomain variable. Remember that +thing coerces thing from a string to a number--this is good when we're working with numbers like birth_year, but will give us errors when working with text like birth_place.

When you're done playing with the basic bar chart, move on to [the grouped bar chart example](https://observablehq.com/@mkane2/d3-grouped-bar-chart).`
Insert cell

One platform to build and deploy the best data apps

Experiment and prototype by building visualizations in live JavaScript notebooks. Collaborate with your team and decide which concepts to build out.
Use Observable Framework to build data apps locally. Use data loaders to build in any language or library, including Python, SQL, and R.
Seamlessly deploy to Observable. Test before you ship, use automatic deploy-on-commit, and ensure your projects are always up-to-date.
Learn more