Bin transform

TIP

The bin transform is for aggregating quantitative or temporal data. For ordinal or nominal data, use the group transform. See also the hexbin transform.

The bin transform groups quantitative or temporal data — continuous measurements such as heights, weights, or temperatures — into discrete bins. You can then compute summary statistics for each bin, such as a count, sum, or proportion. The bin transform is most often used to make histograms or heatmaps with the rect mark.

For example, here is a histogram showing the distribution of weights of Olympic athletes.

Fork

Plot.plot({
  y: {grid: true},
  marks: [
    Plot.rectY(olympians, Plot.binX({y: "count"}, {x: "weight"})),
    Plot.ruleY([0])
  ]
})

The binX transform takes x as input and outputs x1 and x2 representing the extent of each bin in x. The outputs argument (here {y: "count"}) declares additional output channels (y) and the associated reducer (count). Hence the height of each rect above represents the number of athletes in the corresponding bin, i.e., the number of athletes with a similar weight.

While the binX transform is often used to generate y, it can output any channel. Below, the fill channel represents count per bin, resulting in a one-dimensional heatmap.

Fork

Plot
  .rect(olympians, Plot.binX({fill: "count"}, {x: "weight"}))
  .plot({color: {scheme: "YlGnBu"}})

You can partition bins using z. If z is undefined, it defaults to fill or stroke, if any. In conjunction with the rectY mark’s implicit stackY transform, this will produce a stacked histogram.

Fork

Plot.plot({
  y: {grid: true},
  color: {legend: true},
  marks: [
    Plot.rectY(olympians, Plot.binX({y: "count"}, {x: "weight", fill: "sex"})),
    Plot.ruleY([0])
  ]
})

TIP

You can invoke the stack transform explicitly as Plot.stackY(Plot.binX({y: "count"}, {x: "weight", fill: "sex"})) to produce an identical chart.

You can opt-out of the implicit stackY transform by having binX generate y1 or y2 instead of y (and similarly x1 or x2 for stackX and binY). When overlapping marks, use either opacity or blending to make the overlap visible.

Fork

Plot.plot({
  y: {grid: true},
  marks: [
    Plot.rectY(olympians, Plot.binX({y2: "count"}, {x: "weight", fill: "sex", mixBlendMode: "multiply"})),
    Plot.ruleY([0])
  ]
})

CAUTION

While the mixBlendMode option is useful for mitigating occlusion, it can be slow to render if there are many elements. More than two overlapping histograms may also be hard to read.

The bin transform comes in three orientations:

binX bins on x, and often outputs y as in a histogram with vertical↑ rects;
binY bins on y, and often outputs x as in a histogram with horizontal→ rects; and
bin bins on both x and y, and often outputs to fill or r as in a heatmap.

As you might guess, the binY transform with the rectX mark produces a histogram with horizontal→ rects.

Fork

Plot.plot({
  x: {grid: true},
  marks: [
    Plot.rectX(olympians, Plot.binY({x: "count"}, {y: "weight"})),
    Plot.ruleX([0])
  ]
})

You can produce a two-dimensional heatmap with bin transform and a rect mark by generating a fill output channel. Below, color encodes the number of athletes in each bin (of similar height and weight).

Fork

Plot
  .rect(olympians, Plot.bin({fill: "count"}, {x: "weight", y: "height"}))
  .plot({color: {scheme: "YlGnBu"}})

The bin transform also outputs x and y channels representing bin centers. These can be used to place a dot mark whose size again represents the number of athletes in each bin.

Fork

Plot.plot({
  r: {range: [0, 6]}, // generate slightly smaller dots
  marks: [
    Plot.dot(olympians, Plot.bin({r: "count"}, {x: "weight", y: "height"}))
  ]
})

We can add the stroke channel to show overlapping distributions by sex.

Fork

Plot.plot({
  r: {range: [0, 6]},
  marks: [
    Plot.dot(olympians, Plot.bin({r: "count"}, {x: "weight", y: "height", stroke: "sex"}))
  ]
})

Similarly the binX and binY transforms generate respective x and y channels for one-dimensional binning.

Fork

Plot.plot({
  r: {range: [0, 14]},
  marks: [
    Plot.dot(olympians, Plot.binX({r: "count"}, {x: "weight"}))
  ]
})

In addition to rect and dot, you can even use continuous marks such as area and line. In this case you should set the bin transform’s filter option to null so that empty bins are included in the output; otherwise, the area or line would mislead by interpolating over missing bins.

Fork

Plot.plot({
  y: {grid: true},
  marks: [
    Plot.areaY(olympians, Plot.binX({y: "count", filter: null}, {x: "weight", fillOpacity: 0.2})),
    Plot.lineY(olympians, Plot.binX({y: "count", filter: null}, {x: "weight"})),
    Plot.ruleY([0])
  ]
})

The cumulative option produces a cumulative distribution. Below, each bin represents the number of athletes with the given weight or less. To have each bin represent the number of athletes with the given weight or more, set cumulative to −1.

Cumulative: -1 (reverse) +1 (true)

Plot.plot({
  marginLeft: 60,
  y: {grid: true},
  marks: [
    Plot.rectY(olympians, Plot.binX({y: "count"}, {x: "weight", cumulative})),
    Plot.ruleY([0])
  ]
})

The bin transform works with Plot’s faceting system, partitioning bins by facet. Below, we compare the weight distributions of athletes within each sport using the proportion-facet reducer. Sports are sorted by median weight: gymnasts tend to be the lightest, and basketball players the heaviest.

Plot.plot({
  marginLeft: 100,
  padding: 0,
  x: {grid: true},
  fy: {domain: d3.groupSort(olympians.filter((d) => d.weight), (g) => d3.median(g, (d) => d.weight), (d) => d.sport)},
  color: {scheme: "YlGnBu"},
  marks: [Plot.rect(olympians, Plot.binX({fill: "proportion-facet"}, {x: "weight", fy: "sport", inset: 0.5}))]
})

The bin transform sets default insets for a one-pixel gap between rects. You can set explicit insets if you prefer, say if you want the rects to touch. In this case we recommend rounding on the x scale to avoid antialiasing artifacts between rects.

Plot.plot({
  x: {round: true},
  y: {grid: true},
  marks: [
    Plot.rectY(olympians, Plot.binX({y: "count"}, {x: "weight", inset: 0})),
    Plot.ruleY([0])
  ]
})

Bin options

Given input data = [d₀, d₁, d₂, …], by default the resulting binned data is an array of arrays where each inner array is a subset of the input data [[d₁, d₂, …], [d₀, …], …]. Each inner array is in input order. The outer array is in ascending order according to the associated dimension (x then y).

By specifying a reducer for the data output, as described below, you can change how the binned data is computed. The outputs may also include filter and sort options specified as reducers, and a reverse option to reverse the order of generated bins. By default, empty bins are omitted, and non-empty bins are generated in ascending threshold order.

In addition to data, the following channels are automatically output:

x1 - the starting horizontal position of the bin
x2 - the ending horizontal position of the bin
x - the horizontal center of the bin
y1 - the starting vertical position of the bin
y2 - the ending vertical position of the bin
y - the vertical center of the bin
z - the first value of the z channel, if any
fill - the first value of the fill channel, if any
stroke - the first value of the stroke channel, if any

The x1, x2, and x output channels are only computed by the binX and bin transform; similarly the y1, y2, and y output channels are only computed by the binY and bin transform. The x and y output channels are lazy: they are only computed if needed by a downstream mark or transform. Conversely, the x1 and x2 outputs default to undefined if x is explicitly defined; and the y1 and y2 outputs default to undefined if y is explicitly defined.

You can declare additional output channels by specifying the channel name and desired reducer in the outputs object which is the first argument to the transform. For example, to use binX to generate a y channel of bin counts as in a frequency histogram:

Plot.binX({y: "count"}, {x: "culmen_length_mm"})

The following named reducers are supported:

first - the first value, in input order
last - the last value, in input order
count - the number of elements (frequency)
distinct - the number of distinct values
sum - the sum of values
proportion - the sum proportional to the overall total (weighted frequency)
proportion-facet - the sum proportional to the facet total
min - the minimum value
min-index - the zero-based index of the minimum value
max - the maximum value
max-index - the zero-based index of the maximum value
mean - the mean value (average)
median - the median value
mode - the value with the most occurrences
pXX - the percentile value, where XX is a number in [00,99]
deviation - the standard deviation
variance - the variance per Welford’s algorithm
identity - the array of values
x - the middle of the bin’s x extent (when binning on x)
x1 - the lower bound of the bin’s x extent (when binning on x)
x2 - the upper bound of the bin’s x extent (when binning on x)
y - the middle of the bin’s y extent (when binning on y)
y1 - the lower bound of the bin’s y extent (when binning on y)
y2 - the upper bound of the bin’s y extent (when binning on y)
z ^0.6.14 - the bin’s z value (z, fill, or stroke)

In addition, a reducer may be specified as:

a function to be passed the array of values for each bin and the extent of the bin
an object with a reduceIndex method, and optionally a scope

In the last case, the reduceIndex method is repeatedly passed three arguments: the index for each bin (an array of integers), the input channel’s array of values, and the extent of the bin (an object {data, x1, x2, y1, y2}); it must then return the corresponding aggregate value for the bin.

If the reducer object’s scope is data, then the reduceIndex method is first invoked for the full data; the return value of the reduceIndex method is then made available as a third argument (making the extent the fourth argument). Similarly if the scope is facet, then the reduceIndex method is invoked for each facet, and the resulting reduce value is made available while reducing the facet’s bins. (This optional scope is used by the proportion and proportion-facet reducers.)

Most reducers require binding the output channel to an input channel; for example, if you want the y output channel to be a sum (not merely a count), there should be a corresponding y input channel specifying which values to sum. If there is not, sum will be equivalent to count.

Plot.binX({y: "sum"}, {x: "culmen_length_mm", y: "body_mass_g"})

You can control whether a channel is computed before or after binning. If a channel is declared only in options (and it is not a special group-eligible channel such as x, y, z, fill, or stroke), it will be computed after binning and be passed the binned data: each datum is the array of input data corresponding to the current bin.

Plot.binX({y: "count"}, {x: "economy (mpg)", title: (data) => data.map((d) => d.name).join("\n")})

This is equivalent to declaring the channel only in outputs.

Plot.binX({y: "count", title: (data) => data.map((d) => d.name).join("\n")}, {x: "economy (mpg)"})

However, if a channel is declared in both outputs and options, then the channel in options is computed before binning and can then be aggregated using any built-in reducer (or a custom reducer function) during the bin transform.

Plot.binX({y: "count", title: (names) => names.join("\n")}, {x: "economy (mpg)", title: "name"})

To control how the quantitative dimensions x and y are divided into bins, the following options are supported:

thresholds - the threshold values; see below
interval - an alternative method of specifying thresholds
domain - values outside the domain will be omitted
cumulative - if positive, each bin will contain all lesser bins

These options may be specified either on the options or outputs object. If the domain option is not specified, it defaults to the minimum and maximum of the corresponding dimension (x or y), possibly niced to match the threshold interval to ensure that the first and last bin have the same width as other bins. If cumulative is negative (-1 by convention), each bin will contain all greater bins rather than all lesser bins, representing the complementary cumulative distribution.

To pass separate binning options for x and y, use an object with the options above and a value option to specify the input channel values.

Plot.binX({y: "count"}, {x: {thresholds: 20, value: "culmen_length_mm"}})

The thresholds option may be specified as a named method or a variety of other ways:

auto (default) - Scott’s rule, capped at 200
freedman-diaconis - the Freedman–Diaconis rule
scott - Scott’s normal reference rule
sturges - Sturges’ formula
a count (hint) representing the desired number of bins
an array of n threshold values for n - 1 bins
an interval or time interval (for temporal binning; see below)
a function that returns an array, count, or time interval

If the thresholds option is specified as a function, it is passed three arguments: the array of input values, the domain minimum, and the domain maximum. If a number, d3.ticks or d3.utcTicks is used to choose suitable nice thresholds. If an interval, it must expose an interval.floor(value), interval.ceil(value), interval.offset(value), and interval.range(start, stop) methods. If the interval is a time interval such as "day" (equivalently, d3.utcDay), or if the thresholds are specified as an array of dates, then the binned values are implicitly coerced to dates. Time intervals are intervals that are also functions that return a Date instance when called with no arguments.

If the interval option is used instead of thresholds, it may be either an interval, a time interval, or a number. If a number n, threshold values are consecutive multiples of n that span the domain; otherwise, the interval option is equivalent to the thresholds option. When the thresholds are specified as an interval, and the default domain is used, the domain will automatically be extended to start and end to align with the interval.

The bin transform supports grouping in addition to binning: you can subdivide bins by up to two additional ordinal or categorical dimensions (not including faceting). If any of z, fill, or stroke is a channel, the first of these channels will be used to subdivide bins. Similarly, binX will group on y if y is not an output channel, and binY will group on x if x is not an output channel. For example, for a stacked histogram:

Plot.binX({y: "count"}, {x: "body_mass_g", fill: "species"})

Lastly, the bin transform changes the default mark insets: binX changes the defaults for insetLeft and insetRight; binY changes the defaults for insetTop and insetBottom; bin changes all four.

bin(outputs, options)

Plot.rect(olympians, Plot.bin({fill: "count"}, {x: "weight", y: "height"}))

Bins on x and y. Also groups on the first channel of z, fill, or stroke, if any.

binX(outputs, options)

Plot.rectY(olympians, Plot.binX({y: "count"}, {x: "weight"}))

Bins on x. Also groups on y and the first channel of z, fill, or stroke, if any.

binY(outputs, options)

Plot.rectX(olympians, Plot.binY({x: "count"}, {y: "weight"}))

Bins on y. Also groups on x and first channel of z, fill, or stroke, if any.

Bin transform ​

Bin options ​

bin(outputs, options) ​

binX(outputs, options) ​

binY(outputs, options) ​

Bin transform

Bin options

bin(outputs, options)

binX(outputs, options)

binY(outputs, options)