Unlisted
Edited
Sep 30, 2023
Insert cell
Insert cell
Say we have an array of *n* sorted integers, with maximum integer *u* and repetitions allowed.

For example, for the array below we have _n_ = ${n} elements and the largest one is _u_ = ${u}

```
${JSON.stringify(a)}
```

In a language with integer types these would usually be represented in a fixed-width encoding such as `u8` or `u32` that represents each number in base 2. If you know your numbers to span the entire range of the encoding (ie. for `u8` your maximum number is indeed 2^8-1 = 255) this can be a reasonable choice, since on average it takes 8 bits to communicate the value of an 8-bit integer chosen uniformly at random. The amount of information in a random 8 bit integer is 8 bits, so the amount of information in an array of random 8-bit integers is _n_ * 8 bits.

But in our case we know for a fact that our integers are sorted, and in addition our _u_ value might not be a power of 2. both of these special cases suggest that we might be able to do better than a fixed-width encoding.

That's where Elias Fano comes in – it's an encoding that allows us to represent our sorted array more compactly.

The basic idea is that the sorted nature of our array lets us take advantage of _bucketing_

The Elias-Fano exploits the sorted nature of our sequence by *bucketing* – it groups the sequence into contiguous buckets and stores the counts for each bucket, instead of storing the numbers themselves.

It stores the count (high bits) and an offset (low bits) for each entry in the bucket.

Draw an analogy to the same idea in base 10
Insert cell
// todo: check how this lines up with what I was doing
image = FileAttachment("image.png").image()
Insert cell
FileAttachment("ma2021.pdf").url() // image is from this paper
Insert cell
data = a.flatMap((d, i) =>
Array.from(d.toString(), (c, ci) => ({ d, i, c, ci }))
)
Insert cell
Plot.plot({
marks: [
//
Plot.text(data, { text: "c", x: "ci", y: "i", fontSize: 14 })
],
width: 3z0,
margin: 15,
y: { reverse: true },
axis: null
})
Insert cell
a = [0, 2, 5, 5, 10, 11, 15, 17]
Insert cell
2 ** 4
Insert cell
n = a.length
Insert cell
u = a[n - 1]
Insert cell
lo = floor(log2(u / n))
Insert cell
hi = ub - lo
Insert cell
ub = ceil(log2(u)) // u bits
Insert cell
n * ub
Insert cell
nb = n + (u >>> lo) + n * lo // number of bits for the entire sequence
Insert cell
bpe = nb / n
Insert cell
Insert cell
floor = Math.floor
Insert cell
ceil = Math.ceil
Insert cell
log2 = Math.log2
Insert cell
- https://rossanoventurini.github.io/papers/CPM17.pdf
- https://en.wikipedia.org/wiki/Stars_and_bars_%28combinatorics%29
- https://vigna.di.unimi.it/ftp/papers/QuasiSuccinctIndices.pdf
- https://shonan.nii.ac.jp/archives/seminar/029/wp-content/uploads/sites/12/2013/07/Sebastiano_Shonan.pdf
Insert cell

Purpose-built for displays of data

Observable is your go-to platform for exploring data and creating expressive data visualizations. Use reactive JavaScript notebooks for prototyping and a collaborative canvas for visual data exploration and dashboard creation.
Learn more