Bin Ideas / Yuri Vishnevsky

Software engineer and visualization enthusiast. Lives in Cambridge, Massachusetts, with no dogs.

Unlisted

Edited

Jun 24, 2023

**Problem:** Morton space is too large to densely represent as a bitvector (2^32 is a lot).

**Possible solution:** Do we really need that much zoom? Maybe for Geo, but we're not necessarily in that world, and even there our current approach is almost down to the city block level from fully zoomed out to the USA.

- A maximum zoom of 2^28 would shave off four zoom levels but give us 268435456

Can **speed up sparse select** (multiplicity vector) by doing the two level trick of storing zero blocks separately and using rank0. 1. Select on the combined block vector containing blocks with at least one 1: 2. The result will belong to some block, so calculate its index; 3. Rank0 on the block summary vector to see how many all zero blocks there were before the result of the select; add those to the result and return it.

- Could even store **just 32 or 64 bits worth of “skip” block info**, and could represent it inverted so 1 bit means zero block. Then there’s basically no memory lookups needed for the correction information that adds back the skipped zero blocks, and should speed up the worst case of select substantially (lots of repeated elements in the unary-coded multiplicity vector)

- (Set with multiplicity representation idea is from RRR paper)

Fast Rank/Select using non-wasm=accessible instructions (https://blog.acolyer.org/2017/08/08/a-general-purpose-counting-filter-making-every-bit-count/):

> SELECT and RANK can be efficiently implemented on 64-bit vectors using the x86 instruction set from the Haswell line of processors. SELECT is implemented using the PDEP and TZCNT instructions:

> SELECT(v,i) = TZCNT(PDEP(2^i,v))

> RANK is implemented using the POPCOUNT instruction:

> RANK(v,i) = POPCOUNT(v & (2^i -1))

- **With 2^24 universe**

- Can "sort" the raw data directly: Compute the truncated 24-bit morton code, then set the entry in the 2^24 occupancy bitmap, and increment the count in a parallel 2^24-element counts array. Presto, linear sort, and we immediately get our data structure (other than converting the multiplicities into unary coding).

- can ship the 2mib codes to the frontend and compute the indices here, for low-bandwidth interactive exploration

- can we make one of these for eg. every CSV on NYC's data portal?

- No need even for an intermediate `code` array. Oh, except we want to sort the whole dataframe by this stuff so that our indices match the rest of the data (Another way to put it: We want to encode the **sort permutation vector**). Can we store _that_ concisely? _(As a Wavelet tree? See [Section 4.2](https://www.sciencedirect.com/science/article/pii/S1570866713000610); Except that is about reordering symbols, but we have a )_

- note: the following is more nuanced than it semeed; do we need inverse perm?

- To store the permutation, [Succinct representations of permutations and functions✩](https://www.google.com/search?q=succinct+representation+of+permutation+of+integers) has an approach based on "shortcut pointers" that is the same ol' bitvector plus regularly sampled values trick

- can use this to have just the spatial data on the frontend, and use the permutation to query a backend for additional information per tile (which now corresponds to an unordered set of original ids, assuming they were 0-n ordered)

**Bit vector post:** Start from Boolean array. Show how can condense. Then ask about range query. Convert into prefix sum, show rank = access that element, show range query. but now lots of wasted space (in fact even Boolean array typically 8 bits per). Prefix sum is a sorted set of unique numbers. Max number is number of trues. Can represent each by a byte. But how to do rank? Can add a parallel array sorting prefix sums, but now on blocks — so 1/32nd our previous (or 2/32nds) total. State of the art is <5%; see RRR and simple fast papers

[] read timsort: https://svn.python.org/projects/python/trunk/Objects/listsort.txt

- possibly faster popcount (is wasm simd avx2? apparently it is more like sse): [Faster Population Counts Using AVX2 Instructions](https://arxiv.org/abs/1611.07612)

- We can do range count on all symbols in a wavelet tree **simultaneosuly** by descending down both branches of the tree each time, thus saving the work that would otherwise be repeated at each level.

- Carefully analyze the trade-offs of adopting a **Huffman** tree for the wavelet tree. I think there are trade-offs including the fact that we no longer get to choose the order of the symbols since they have to be ordered by the huffman coder. And think about our access patterns, which include wanting to compute bar charts indicating the frequency of every element, regardless of how uncommon it is.

try the iowa liquor sales dataset!

- If we want to derive a bit vector for attribute.value == x, I think we can actually do this top-down with the wavelet tree by using boolean bit ops.

- Start with bitvector of ones.

- If the symbol is in the right half at the top level, then we want to AND with that level (otherwise and-not).

- If the symbol is in the right half at the next level, want to AND with the next level, and so on. I think we could even do ranges this way though it seems more complicated (and i'm still not sure this is even right) but the idea is that eventually, each bit ends up being an AND or ANDNOT vertically across leels, in a way that might totaly not work since each 'column' does not consistently represent the same element so um yeah nvm?

- With a wavelet matrix, all of the zeros go to the left and all of the ones to the right.

- If we reserve the leftmost or rightmost symbol to indicate "filtered", what update do we need to make to a wavelet matrix to "send" every filtered element to (0, ... 0) or (1, ..., 1)?

// image = FileAttachment("image.png").image() // from comment here: https://codeforces.com/blog/entry/52854

- Wavelet Trees for All: https://www.sciencedirect.com/science/article/pii/S1570866713000610

- [Translating Between Wavelet Tree and Wavelet Matrix Construction](https://arxiv.org/abs/2002.08061)

- [Wavelet Trees for Competitive Programming.pdf](${await FileAttachment("Wavelet Trees for Competitive Programming.pdf").url()})

- **We might just want a wavelet tree (not matrix)**, since our alphabet is always known at construction time, and is never greater than the number of data points. The paper describes that when the alphabet is big relative to data size is when you might want to use the matrix. **Though**: in Wavelet Trees for All, he says that matrix is/can be faster than the levelwise representation (and i do not want pointers).

- "The wavelet matrix has the advantage of being implementable using only O(log σ) extra words of memory instead of the O(σ) used to store the tree structure in the pointer based alternative while maintaining fast operations."

- **Toggle**:

- Says how to toggle individual elements one-by-one, but maybe can shed light on how to apply an "active mask" in one go...

- **We can filter the scatterplot by specific attributes values** – just compute a mask vector. Might still be expensive to back out a mask vector for a specific attribute range in a wavelet matrix, but we can do it (plus if we have multi attribute then we have to intersect those masks).

- Then do rank queries on the mask as a way to compute the filtered count per bin in the scatterplot.

- Much harder to filter the attributes themselves based on other attributes – can be thought of as **lazy deletion** from a wavelet matrix. Maybe not so hard once I understand the data structure... Maybe ask **zach** or **edemaine**?

- wavelet matrix does not allow dynamically changing alphabet, but maybe there is a cheap way to **change a value from one symbol to another** with a special symbol to mark deletion. Would still need to do this in bulk – "rename all symbols based on this mask". Special symbol could be at the start or end of the alphabet, if it helps any...

https://twitter.com/simonw/status/1572285367382061057

> If someone gives you a CSV file with 100,000 rows in it, what tools do you use to start exploring and understanding that data?

[] paste all of my note-to-self emails into this document

// todo: histogramY, histogramX that roll up a 2d array by summing rows/cols

// then cumsum on those is x/y cdf

// then we can report, when hovering, "32% of x values are less than this" (vs. 50% in the other group)

// log(color) takes a meaningful fraction of total frame time (3-6ms); can we avoid bin normalization and do it all on the gpu?

- Can we use a ddsketch style binning strategy to get dynamic quantiles? Counts based on bins, cumsum, report.

- would need to be able to support potentially thousands of independent attribute values (this is how many bins you can get with ddsketch)

- Related: https://dotat.at/@/2022-07-15-histogram.html, https://github.com/fanf2/hg64

We can create leaflet-style tiles from our zoom tiles, and make a heatmap like https://alpercinar.com/open-cell-id/

[] explore applications to the [UCR Matrix Profile](https://www.cs.ucr.edu/~eamonn/MatrixProfile.html)

**writeup:** Story is accelerating heatmaps; plot twist is change in perspective from doing one thing per datum to doing one thing lee bin

Bin: for associated column, can have one cumsum column per aggregate bar. If we are OK with fixed discretizatiobs for associated columns.

Q: would my approach work for fast parallel coordinates too? would we need an n-dim spatial index for n parcoords sliders? maybe even low-res would work just fine

-start the js library on a branch

-just the spatial index functions

- eventually, in the zig version, can return index ranges as a list of [lo, hi, i_0, i_1, ... i_n, lo, hi, ...] with lo-hi+2 telling you how many indices there will be in that range; but all stored in a contiguous u32.

- figure out how to bound the size of the index list better than 2 * number of bins

- i think # bins + w + h should be fine – can at least try and see...

- for now the "interface" to the spatial index is searchBefore / searchAfter. figure out how that looks in a zig world for maximum efficiency.

- for now aim to put the corest of core functionality into a js-only library that has no dependencies

- morton codes

- spatial search funcs

- note: rangeCheck will overflow if converting the maximum u32 value so do an explicit check for it before saying hi+1

- do linear sort by allocating another array the size of the data

- count the number of points in each hi_bits bin

- then, now knowing the offset information put each point into its bin (not yet sorted):

hi_bin = x >>> 16

offset = offsets[hi_bin] + count_so_far[hi_bin]

sorted[offset] = x & 0xffff

count_so_far[hi_bin] += 1

- then, go through each bin and use a scratch space of 2^16 u32 elements to count up elements then emit them into the array as u16, or

- (later): if the bin has >= 2 * 2^16 elements, convert that space to semantically represent a u32 array of counts. can simply memcpy the scratch space into the destination (need to patch up hi_offsets though i think, subtracting the difference from subsequent buckets... can be n^2, so do this more cleverly)

- i think this means we can sort our numbers in two passes, with the first step being just computing the hi_offsets

- it will be great to have this packed away & simplified.

- eventually can write a zig version of the preprocessor using the Arrow C interface: https://arrow.apache.org/docs/format/CDataInterface.html

could send another texture to the gpu containing the z index for each xy index, and compute that lookup array (xy indexes) with simd

- We could compute the bit planes of the raw data and use that to execute a binary search.

- The low bit planes could be stored as ranges, eg. for the first msb just store the position of the 0-1 split (remember, the data is sorted!)

- feels sorting network ish...

- good post on RRR for fast compressed bitmaps: https://www.alexbowe.com/rrr/

- An optimal algorithm for decomposing a window into maximal quadtree blocks: https://sci-hub.se/10.1007/s002360050160 (downloaded)

- speeding up zorder interleaving: https://lemire.me/blog/2018/01/09/how-fast-can-you-bit-interleave-32-bit-integers-simd-edition/

interpolative coding: https://github.com/jermp/interpolative_coding

- look into bitwise tries

- try to minimize the size of the morton codes... eg., pick the smallest codespace that disambiguates all of our points.

- is there some way to make the numbers smaller again? our integers span the full u32 range which makes them hard to turn into small ints or compress...

- On Slicing Sorted Integer Sequences: https://arxiv.org/pdf/1907.01032.pdf (about slicing the universe into chunks, rather than equal-data-length chunks)

- can we use a single type of binary search - NextGEQ-equivalent? Finds the index after the element if it exists. Not sure how this works with multiplicity; wondering if we can simply subtract 1 from the answer for the search of the next-biggest integer or something... Maybe that assumes multiplicity of 1.

**Note:** We can store 2^32 bits in 512 MiB, uncompressed. That's actually totally not bad as a choice for larger datasets. And gives us O(1) rank queries if we figure out how to implement it

- Compact Querieable Representations of Raster Data using a k^2 tree: https://users.dcc.uchile.cl/~gnavarro/ps/spire13.3.pdf

- Aggregated 2D Range Queries on Clustered Points using a k^2 tree: https://arxiv.org/pdf/1603.02063.pdf

- improving multi binary search (2014!) https://stackoverflow.com/questions/25699858/binary-search-for-multiple-distinct-numbers-in-a-large-array-in-minimum-number-o

- possible good explainer for implementing a simple fast bitmap with rank: file:///Users/yurivish/Downloads/practical-rankselect-queries-over-arbitrary-sequences.pdf

- actually kinda sus; they don't say how bad it is when it's bad (for very sparse bitsets)

- also https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.450.7991&rep=rep1&type=pdf - Fast, Small, Simple Rank/Select on Bitmaps

- points to PRACTICAL IMPLEMENTATION OF RANK AND SELECT QUERIES (https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.9548&rep=rep1&type=pdf) from the same authors when talking about fast rank

- 2018: SuRF: Practical Range Query Filtering with Fast Succinct Tries - https://dl.acm.org/doi/pdf/10.1145/3183713.3196931

- Another way to do 'most frequent per bin' is to precompute binnings for all the guys and just take the maximum one (even in a shader, but more likely use the precomputed weight cumsum vectors from the exploded form... uri|a, uri|b)

- for efficiency of additional attributes

- binned bitmaps? could eg. bin quantitative data into deciles or percentiles, and treat categorical as separate categories and hope they're not too high-cardinality... though tfeUri is.

- We should just say, in keeping with this [wiki section on in-memory bitmaps](https://en.wikipedia.org/wiki/Bitmap_index#In-memory_bitmaps): "We are going to traverse the raw data ahead of time, in order to build a bit index for each categorical column, then do bit operations on that to get the composite filter, then do a cumsum on that (we can do "sum up these two u64s" with a lookup table of 64^2 sums in it maybe).

- Then that becomes the weight lookup vector to use when we render.

- Does this work for stuff like "most frequent"? How much space would it take if we had e.g. 2,000 of these bitmap indexes?

- Hmm. A lot (gigabytes) for a 300M-point dataset. So we could walk the raw data once to create the "bitmap" (made of u32), then cumsum it. Should still take less than a second, which is actually fine I guess...

- Then the question is well, we want to do this to create a multi-category visualization so we actually want all these bitmaps in memory all at once. That's when we get a problem – s 300M * 4 bytes is 1.2GB so each one is really costing us. Though it's a sorted integer set. With probably lots of runs if we consider the spatial nonuniformity of most data.

- [Bitmap Index vs. B-tree Index: Which and When?

](https://www.oracle.com/technical-resources/articles/sharma-indexes.html)

- [Efficient binning for bitmap indices on high-cardinality attributes](https://escholarship.org/uc/item/54h9r8gq)

- wiki: https://en.wikipedia.org/wiki/Bitmap_index#In-memory_bitmaps

- [Space Efficient Bitmap Indexing](https://dl.acm.org/doi/pdf/10.1145/354756.354819)

- _What if everything was one big Morton code? _ 🌌

- Scatterplot is 2D histogram. Sidebar is 1D histograms. We want to be able to choose which dimensions go in the main view and which go in the sidebar as conditional distributions based on the current selections (selection = zoom level or user-specific selection in the scatter plot, or [Crossfilter](http://crossfilter.github.io/crossfilter/)-like selection in the other views. Could even have multiple scatterplots, maybe even a density SPLOM... But wtf indexing do we do for _that_?

- adaptive color scale

- histogram equalization: datashader folks swear by it (what do they use?)

- https://roadtolarissa.com/coloring-maps/

- quantile is not ideal; can obscure patterns in the data

- jenks natural breaks

- we want to do **incremental** computation in both the bin-based technique and data-projection-based technique.

- for data, bin N points then provide a new normalized bins array containing those points; can show progress bar.

- for bins, can bin the bottom-left and top-right of the whole screen first to get the count for the whole screen, then do the next level of refinement, then the next; or perhaps fill in from the bottom left going upwards, so we get the nicer contiguity of queries and less binary searching overall.

- not sure how this would work for more sophisticated range query stuff where it's kinda expensive to do the range query (eg. most frequent element)

// browsing RPCs: would be nice to browse a specific square and view the distribution of routes in that square, or distribution of all other metrics within that square (cf. the cell tower demo)

// tallying the most frequent element in a column of strings: there is fixed key set; and the arrow dictionary knows it. and knows how to translate indices to strings; could build the reverse lookup table and use that as part of a more efficient "most frequent element" reducer

- read this thread for high-performance DOM: https://twitter.com/slightlylate/status/1562292999933001728

- can we bucket sort our 32 bit ints since we know they span more or less the whole range?

- binary search variants: https://github.com/scandum/binary_search (look at "Adaptive Binary Search")

- milan's multi-value binary search: https://github.com/juliusmilan/multi_value_binary_search/blob/master/mmkbs_draft.pdf

- Could we make an LEHD map?

- Instead, use it to link up pickup/dropoff locations!

- inspired by this example https://twitter.com/benmschmidt/status/1561720471497875456

- xi takeaways

- groupby v important

- filter v important

- layers important/useful; lots of problems come back to "bin this subset of points and visualize it as its own layer"

- lots can be done with a weight vector;

- compute weights however we want, then cumsum & use that to visualize the total weights per each bin.

- q: what other operations can we do other than sum?

- one kind of 'associated column': pointing to another data point. Eg. highlight some data points, then highlight in another color all of the ones they are related to.

- eg. for event data, "this event caused this other one" so we want to click an event or highlight a bunch of events, then see who caused them. Kinda like LEHD.

- the dimension of "cpu utilization" is surprisingly complex – it's not quite ordinal; more like interval data; you only have 100 values (or maybe you bin into different-sized cpu utilization bins). But you don't really want to zoom into it... And the position along this dimension is definitely not a point. _And_ it's not inherently power of 2 sized; so how to conver it to morton codes is an interesting question. Could code each point as a power of 2 interval, actually... Hm. And now we need to somehow figure out how to handle the aspect ratio issues this raises.

- **principle**: always be able to get back to the raw data. The key to this are _indices_ -- we are always able to draw a correspondence between a power-of-two box and the underlying index range in the raw data. This is what lets us add sample points, plot disaggregated points, do remote data fetches, get associated data, etc.

- **principle**: Scale to zero. A small dataset should also just work, and work well. No bins unless superhighdensity regions. And interaction, selection, hover, sample labels for one point per highlevel bin, etc.

- **principle**: 1d is as important as 2d. Single-dimensional histograms are also a key use case, eg. crossfilter.

**need to flip y around the min-maxcoord global system, not the local window!!!**

- we can have one grid for display and another for "overlaid examples" - so we always have high-level samples from across the full image. Kinda like Snapseed, if our dataset was one of images. Ask Moritz if he's still got his insta data around. Could basically do a low res binning to show the images, and also show density as white glowing blobs or such (or as an opacity underlay).

- observation: When we do a collection of binary searches, they are all at the same level. meaning: each of the queries has their lower k bits zeroed.

- Similarly, each has a common prefix (possibly length zero, but often not). So we are actually only searching within some constrained bit range.

- We kinda get clusters for free: don't split a quadtree bb if the #points > too many; label it with how many points. Otherwise, draw the points directly. So we can always draw outlier regions, while still keeping as a heatmap those areas with too much density. Though we'd need a per-tile limit since otherwise hard to say globally how many points are on screen (and thus how many are represented by a particular bin).

- Or we could compute counts for each bin, THEN decide based on a total "point budget" to draw the bins from lowest to highest! (bucket sort or such)

- like https://www.koalastothemax.com/

- sorted integer compression with O(1) random access: Elias-Fano: https://www.antoniomallia.it/sorted-integers-compression-with-elias-

- fano-encoding.html#fn:fn3

- related, apparently says how to do the "select the i-th element" via a "efficiently find the i-th set bit" according to [this paper](https://drops.dagstuhl.de/opus/volltexte/2017/7324/pdf/LIPIcs-CPM-2017-30.pdf): https://arxiv.org/abs/1206.4300

- also there is a "partitioned Elias-Fano coding" which improves efficiency by batching "related" segments of data (1d clusters?)

- rank and select: https://helda.helsinki.fi/bitstream/handle/10138/316600/Timonen_Jussi_gradu_2020.pdf?sequence=3&isAllowed=y

returns the index of the i-th occurence of x in the data. In this thesis, these two operations are assumed to work with a bit array.

> Rank1(i) counts the number of bits set to 1

before i and select1(i) returns the index of the i-th bit set to 1. Both operations can be

implemented to work in constant time (see, e.g., Gog et al., 2014). [yuri: I think [this] (https://people.eng.unimelb.edu.au/sgog/optimized.pdf) is gog 2014]

- Gog (2014) says: "Unfortunately, in practice, the runtime of operations on succinct data structures tends to be slower

than the original data structure they emulate. **Succinct data structures should therefore only be used

where memory constraints prohibit the use of traditional data structures.**"

- Maybe a **linear scan** over the bit array would be warranted, if we get to resolve the full set of "binary searches" that we want to do in one go.

- This seems like a nice paper: https://arxiv.org/pdf/1206.4300.pdf

- Could use SIMD (though saf doesn't support it yet)

- if we do range splitting, I think this makes our problem simpler, though I still struggle to articulate it cleanly.

- input query: find n

- Try [interpolation search](https://en.wikipedia.org/wiki/Interpolation_search); maybe adapting somehow to the nonuniformity of the data as we discover it. We have a fixed, finite space and maybe there's some useful universal assumption about "interesting" data, ie. clustered... Make it work worse on uniformly distributed data, since it's less common.

- make overview "minimap" showing where you are in the scale of the big map.

- for geo, could make a custom mapbox layer.

- marginal histograms / marginal heatmaps / marginal annotations

- Lemire on interleaved binary search:

1. https://lemire.me/blog/2019/09/14/speeding-up-independent-binary-searches-by-interleaving-them/

2. https://lemire.me/blog/2019/09/20/how-far-can-you-scale-interleaved-binary-searches/

- compressing the biplanes of the delta encoded Martin codes for 1 billion OpenStreetMap (lng,lat) points results in a 33 MB file. This suggests that the raw 4gb codes can be **more compactly represented** even client-side, for much faster binary searching...

- we can visualize a stratified random sample of points sampled from a low resolution binning. We know the weight of each bin so we can even sample representatively or not depending on what we want to do.

- sampling from each bin is an instance of the principle of **showing examples**. There are 5,600 points in this bin; for example, this one: ...

### Projects

- dog breed-name-neighborhood crossfilter

### References

- umap of arxiv: https://observablehq.com/@bmschmidt/arxiv?collection=@bmschmidt/deep-scatterplots

- Analyzing video game deaths (really cool vis!): https://cgcooke.github.io/Blog/datashader/visualisation/pubg/2020/05/31/Visualising-PUBG-Deaths-With-Datashader.html

- [GLSL Gaussian Blur](https://observablehq.com/@stwind/glsl-gaussian-blur) by @stwind

- [Anti-Aliased Grid Shader](https://madebyevan.com/shaders/grid/) by @evanw

- [2D Surface Reconstruction: Marching Squares With Meta-balls](https://iradicator.com/2d-surface-reconstruction-marching-squares-with-meta-balls/)

- [Function Plot (with time (and contours))](https://observablehq.com/@rreusser/function-plot-with-time-and-contours) by @rreusser. This notebook uses the cube helix implementation from there.

- https://iquilezles.org/articles/hwinterpolation/

- https://bartwronski.com/2020/04/14/bilinear-texture-filtering-artifacts-alternatives-and-frequency-domain-analysis/

- https://iquilezles.org/articles/texture/

- [Efficient Gaussian blur with linear sampling](https://www.rastergrid.com/blog/2010/09/efficient-gaussian-blur-with-linear-sampling/)

- https://observablehq.com/@s4l4x/efficient-gaussian-blur-with-linear-sampling

- [glfx: Triangle Blur](https://github.com/evanw/glfx.js/blob/master/src/filters/blur/triangleblur.js) We use this blur by @evanw

- morton nearest neighbor searching (see LowDimNearestNeighbors.jl)

- https://snorrwe.onrender.com/posts/morton-table/#range-query-splitting on linear quadtrees and range splitting

- https://github.com/snorrwe/morton-table/tree/master/morton-table

- https://github.com/mapbox/supercluster

- https://github.com/snorrwe/morton-table/blob/master/morton-table

- https://github.com/mourner/kdbush (also sort-based)

- https://alpercinar.com/open-cell-id/

- https://fgiesen.wordpress.com/2009/12/13/decoding-morton-codes/

- litmax / bigmin from https://twitter.com/jonahharris/status/1337087177591820290/photo/1 used with permission of the author (who says he got it from BSD-licensed code somewhere)

- using the idea of stopping when ranges are 'small enough' from https://snorrwe.onrender.com/posts/morton-table/#range-query-splitting

- https://aws.amazon.com/blogs/database/z-order-indexing-for-multifaceted-queries-in-amazon-dynamodb-part-1/ just interesting reading, plus the followup; gives the idea of just doing range optimization/query planning in z space without worrying about the data distribution when deciding whether to split or not

- datashader tutorial: https://examples.pyviz.org/census/census.html with data at http://s3.amazonaws.com/datashader-data/census2010.parq.zip

- https://github.com/juliusmilan/multi_value_binary_search

- on binary search and branch mispredictions: https://pvk.ca/Blog/2012/07/03/binary-search-star-eliminates-star-branch-mispredictions/

- also on binary search, walking through optimizations: https://en.algorithmica.org/hpc/data-structures/binary-search/

- billion point dataset (too large to load, even just 32-bit morton codes!) https://examples.pyviz.org/osm/osm-1billion.html

- more examples: https://examples.pyviz.org/

- vis the uber dataset w/ separate layers for pickup and dropoff: https://www.youtube.com/watch?v=n4cFwPan59I

- data: https://s3.amazonaws.com/datashader-data/nyc_taxi_wide.parq

- There's some connection here to sparse arrays. Right now we store one entry per datapoint. We could instead of aggregate to store one entry per Morton code, i.e. per lowest-level bin. This means we have a max of 2^32 "bins", or >2 gigapixels. Most have zero weight, and so do not need to be stored. Probably lots have weight of 1, but treating that differently would be an optimization. We would need to store the weight of each point alongside it, so this optimization only makes sense if the image is sparse enough.

- Another stray thought that seems related: if you only care about presence or absence then we can store a BitArray of 2^32 booleans.

- For multiple "channels", we can either store multiple parallel weight arrays (eg. one per race in the census data), or can partition the data into separate arrays by race, so that each partition stands alone and can be binned and counted independently.

- We're really just sending a compact representation of a 65,536×65,536 image.

- In the case where we aggregate duplicate codes, the parallel cumulative array of weights also encodes indices into the original array of the data range that this came from. This could be useful for e.g. looking up granular data based on the bins.

- xi

think about how this works with a time dimension

https://www.windytan.com/2013/03/rendering-pcm-with-simulated-phosphor.html

x = time

y = cpu time

z = thread

"amount of time this thread went to sleep or used cpu"

thread a wakes up b, b comes back

within a minute, how many thread ids would be active?

thread id is 0-32 so one real thread gets different ids over time

event data

and events have connections, relationships

"tell me where is the 99th percentile request"

is 1 request

getting 99th per

click one point to highlight other points that are "the same" or related

it gives you the chance to see the individuals - not just aggregate

vector multiply cumulative sum arrays? would that work? probably not; a .* b

one point wakes up a second point

"give me all the wakeups that happen during while an event is being processed"

click on one point => highlight related points;

just

quantiles

if you are zoomed in, you can jsut compute the weights for that set of points

groupby is v important

- cpu utilization; bins of 0-100 along that dimension

- host

- latency vs. utilization

- density map - have very rectangular

x is 0-100 util

latency is 0-whatever

so you get nonsquare?

the set size of categories for platform is maybe 20 - f5, etc.

most of the time for a lot of data, very efficient with precomputed dimensions

usualyl the data is bigger than catalog size

crazy thing he worked on:

high frequency profiling

gigs of samples per seconds

one terabyte of data

samples something every 200 cycles

one sample per 24ns

so he had to sample the data down to get a handle on it

filter is v important

Purpose-built for displays of data

Observable is your go-to platform for exploring data and creating expressive data visualizations. Use reactive JavaScript notebooks for prototyping and a collaborative canvas for visual data exploration and dashboard creation.

Learn more