Public
Edited
Jan 4, 2024
1 fork
Insert cell
# Perceus: Garbage Free Reference Counting with Reuse
Insert cell
Immutability is an amazing property, because it helps make reasoning about programs more local. But a naive implementation of immutability comes at the cost of an enormous amount of memory usage. Garbage collectors were invented to handle just this problem, but they present a tradeoff with other memory management techniques like reference counting. (e.g. GC performance is harder to tame/predict than RC)

Perceus automatically annotates programs with reference counting operations, and presents some optimizations to make it even faster. Crucially, the programmer need not think about this process (most of the time) unless they want to take even more advantage of the strategy using the functional-but-in-place (FBIP) idiom.

How does this method work? That's what I'm going to try to unpack in this post. In particular, Perceus uses some notions of ownership, borrowing, and (implicitly) non-lexical lifetimes. These ideas are also present in Rust, which does not use GC or RC, but determines when to allocate and deallocate statically. (Right?????) How do these concepts compare and contrast in the two systems?
Insert cell
## Abstract Machine Model

For this paper, we'll use a slightly modified version of the LC used in Perceus. In particular, our machine will have a stack, a heap, and a currently evaluated expression with a continuation stack. Pretty standard stuff. Pointers into the heap will maintain reference counts. Because we do not have mutable references, there won't be any cycles in the heap (what about recursive functions???)
Insert cell
## When Does Allocation Happen?

Allocation happens when we evaluate an expression literal to a value. i.e. when we evaluate a lambda to a closure or a constructor expression to a constructor.
Insert cell
## Ownership, Borrowing, and the Rules

Counting Immutable Beans and Perceus differ in the way they think of ownership and borrowing. Immutable Beans is closer to Swift's ARC, and uses strong pointers (owners) and weak pointers (borrowers). Strong pointers are responsible for, on net, consuming one reference count token. How do you *consume* a token? You consume a token by either decrementing the reference count or using the variable in an expression exactly once. If you want to use a variable more than once, you have to add a token using an increment before using it.

Weak or borrowed references do not contribute to token usage. They essentially act like immutable borrows in rust. Borrowed references cannot call dec or inc. They cannot consume variables, that is they cannot dec, return, or pass to functions that take owned arguments.


Perceus works similarly, but with a more flexible concept of borrowing that allows for a simplified semantics. Just as before, an expression that owns the variables `x` is responsible for, on net, consuming one of its reference count tokens. However, "weak pointers" are much stronger. They are responsible for, *on net,* not consuming any RC tokens. However, this means that borrows *can* mess with reference counts as long they don't decrement anything (since that would cause them to fall out of scope).

In the actual implementation of the Perceus source-to-source translation, owners *never increment* and borrows (of course) *never decrement*. I do not know what operations are allowed on borrowed references in Immutable Beans.

I think it is more appropriate to call the contexts in Perceus, **persist** and **consumed**. That is, variables in the *persist* set last beyond the current expression's evaluation (and their ref count is unchanged), and variables in the *consumed* set have their ref count *decremented.* All free variables in the expression are either persistent or consumed, and all consumed variables must be free in the expression (but persistent variables need not be in the free variable set).



### Variables
If we own the variable, do nothing (since variables are linear by default). (How does it get dropped?????) (Is there some sort of create/destroy cancellation? I think so...)
If we are borrowing the variable, we need to duplicate the reference so we can hold on to it.

### Lambda
Consider `\x. e`. Let `ys := fv(\x. e)`.

First, if `x` isn't free in `e`, we can drop it immediately in the body of the lambda.
Next, any `ys` that we do not own, we must be borrowing and so we have to duplicate them.
Finally, when we visit the body of the lambda, we temporarily transfer ownership of `ys` and `x` (if it is not dropped) to `e`. I *think* this ownership is returned later...
Insert cell
\Delta \mid \Gamma \vdash_s e \rightsquigarrow e'
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
Insert cell
## Functions
Insert cell
Insert cell
Insert cell
The body consumes its free variables and its function argument (if it is used). To maintain the invariant, persistent free variables are duplicated. If the function argument is not used it must be consumed by a drop.
Insert cell
x \in \textsf{fv(}e\textsf{)} \implies \emptyset \mid ys, x \vdash_s e \rightsquigarrow e' \land \texttt{dup } \Delta_{\text{free}}; \lambda x.\ e'
Insert cell
x \not\in \textsf{fv(}e\textsf{)} \implies \emptyset \mid ys \vdash_s e \rightsquigarrow e' \land \texttt{dup } \Delta_{\text{free}}; \lambda x.\ (\texttt{drop } x; e')
Insert cell
## Application
Insert cell
How is the notion of passing ownership formalized? Variables are consumed eagerly i.e. as early as possible. This applies to application, binding, and pattern matching. So to fulfill its consumption obligation, the expression passes it on to its subexpressions.

If a variable is borrowed, it is duplicated and then passed to the lambda for consumption

Question is who is responsible for consuming the input? outside the subexpressions or inside them? If outside, then must duplicate early, pass in, then drop. If inside, then can pass the duplication operation inside.
Insert cell
Insert cell
Insert cell
Insert cell
// TODO: need to collect heap references from the zipper!!!
add_heap_refs = (orig_data) => {
return {
...orig_data,
trace: orig_data.trace.map(({frames, heap}) => {
let pointers = [];
heap.forEach(([rc, hv], i_h) => {
if (hv.TAG === 0) {
hv._0.forEach(([x, v], i_hv) => {
pointers.push({src: ref(`../../heap/elements/${i_h}/heapValue/stack/elements/${i_hv}/v`), tgt: ref(`../../heap/elements/${v._0}`)})
})
}
});
return {
frames: frames.map(({zipper, stack}, i_f) => {
stack.forEach(([x, v], i_s) => pointers.push({src: ref(`../../frames/${i_f}/stack/elements/${i_s}/v`), tgt: ref(`../../heap/elements/${v._0}`)}));
return {
zipper,
stack: stack.map(([x, v]) => ({x, v, hRef: ref(`../../../../../heap/elements/${v._0}`)})),
};
}),
heap: heap.map(([rc, hv]) => [rc, hv.TAG === 0 ? {...hv, _0: hv._0.map(([x, v]) => ({x, v, hRef: ref(`../../../../../${v._0}`)}))} : hv]),
pointers,
};
})
};
}
Insert cell
render(data.trace[7], config())
Insert cell
process_data = (orig_data) => ({
...orig_data,
trace: orig_data.trace.map(({frames, heap, pointers}) => ({frames: frames.map(({zipper, stack}) => ({zipper, stack: mkList(stack)})), heap: mkList(heap.map(([rc, hv]) => [rc, hv.TAG === 0 ? {...hv, _0: mkList(hv._0)} : hv])), pointers}))
})
Insert cell
data = process_data(add_heap_refs(orig_data3))
Insert cell
// function lambdaShape() {
// return createShapeFn({
// shapes: {
// "{": M.text({contents: "{", fontSize: "18px"}),
// "}": M.text({contents: "}", fontSize: "18px"}),
// },
// fields: {
// "arg": createShapeFn(arg => M.text({contents: `fn (${arg})`, fontSize: "18px"})),
// "body": createShapeFn(body => rcexprGlyph()(body)),
// },
// rels: {
// "arg->{": [C.alignCenterY, C.hSpace(5)],
// "{->body": [C.alignCenterY, C.hSpace(5)],
// "body->}": [C.alignCenterY, C.hSpace(5)],
// }
// })
// }
Insert cell
/*
| Val(value)
| Var(id)
| Lam(id, serializeRCExpr)
| App(serializeRCExpr, serializeRCExpr)
| Let(id, serializeRCExpr, serializeRCExpr)
| Match(serializeRCExpr, array<(pat, serializeRCExpr)>)
// ref counting exprs introduced by translation/lowering and at runtime
| Dup(id, serializeRCExpr)
| Drop(id, serializeRCExpr)
*/

function rcexprGlyph() {
return (e) => {
switch (e.TAG) {
case 0: return M.text({contents: "Val"});
case 1: return M.text({contents: e._0, fontSize: "18px"});
case 2: return createGroup({
"{": M.text({contents: "{", fontSize: "18px"}),
"}": M.text({contents: "}", fontSize: "18px"}),
$arg$: arg => M.text({contents: `(${arg}) =>`, fontSize: "18px"}),
$body$: body => rcexprGlyph()(body),
// "body": rcexprGlyph(),
},
{
"arg->{": [C.alignCenterY, C.hSpace(5)],
"{->body": [C.alignCenterY, C.hSpace(5)],
"body->}": [C.alignCenterY, C.hSpace(5)],
}
)({arg: e._0, body: e._1});
case 3: return M.text({contents: "App"});
case 4: return M.text({contents: "Let"});
case 5: return M.text({contents: "Match"});
case 6: return M.text({contents: "Dup"});
case 7: return createGroup({
"dec": M.text({contents: "dec", fontSize: "18px"}),
";": M.text({contents: ";", fontSize: "18px"}),
$x$: x => parens(M.text({contents: x, fontSize: "18px"})),
$e$: rcexprGlyph(),
},
{
"dec->x": [C.alignCenterY, C.hSpace(0)],
"x->;": [C.alignCenterY, C.hSpace(0)],
";->e": [C.alignCenterY, C.hSpace(5)],
}
)({x: e._0, e: e._1});
}
}
}
Insert cell
function pointer() {
// return (e) => M.text({contents: `@${e._0}`, fontSize: "18px"});
return (e) => createGroup({
"circle": M.circle({r: 12, fill: "cornflowerblue"}),
"text": M.text({contents: `${e._0}`, fontWeight: "bold", fontSize: "16px", }),
},
{
"circle->text": [C.alignCenter, C.alignMiddle],
}
);
}
Insert cell
function value() {
return (e) => {
switch (e.TAG) {
// pointer
case 0: return pointer()(e)
// num
case 1: return M.text({contents: `${e._0}`, fontSize: "18px"})
}
}
}
Insert cell
render({TAG: 0, _0: "foo"}, value())
Insert cell
/*
type heapValue =
| Clo(stack, id, rtexpr)
// TODO: there's something different about the semantics of Con in Perceus...
| Con(id, list<value>)
*/

function heapValue() {
return (e) => {
switch (e.TAG) {
case 0: return createGroup({
"{": M.text({contents: "{", fontSize: "18px"}),
"}": M.text({contents: "}", fontSize: "18px"}),
$stack$: stack(),
$arg$: arg => M.text({contents: `(${arg}) =>`, fontSize: "18px"}),
$body$: body => rcexprGlyph()(body),
},
{
"stack->arg": [C.alignCenterY, C.hSpace(5)],
"arg->{": [C.alignCenterY, C.hSpace(5)],
"{->body": [C.alignCenterY, C.hSpace(5)],
"body->}": [C.alignCenterY, C.hSpace(5)],
}
)({stack: e._0, arg: e._1, body: e._2})
case 1: return M.text({contents: "constructor", fontSize: "18px"});
}
}
}
Insert cell
function parens(s) {
return createGroup({
"(": M.text({contents: "(", fontSize: "18px"}),
"expr": s,
")": M.text({contents: ")", fontSize: "18px"}),
},
{
"(->expr": [C.alignCenterY, C.hSpace(0)],
"expr->)": [C.alignCenterY, C.hSpace(0)],
}
)
}
Insert cell
function box(s, stroke="magenta") {
return createGroup({
"box": M.rect({ fill: "none", stroke}),
"expr": s,
},
{
"box->expr": [C.sameWidth, C.sameHeight, C.alignTop, C.alignLeft],
}
)
}
Insert cell
/*
| Val(value)
| Var(id)
| Lam(id, serializeRTExpr)
| App(serializeRTExpr, serializeRTExpr)
| Let(id, serializeRTExpr, serializeRTExpr)
| Match(serializeRTExpr, array<(pat, serializeRTExpr)>)
// ref counting exprs introduced by translation/lowering and at runtime
| Dup(id, serializeRTExpr)
| Drop(id, serializeRTExpr)
// drop by ptr instead of by variable
| DropPtr(pointer, serializeRTExpr)
// adapted from Oxide. maybe there's a "nicer way"TM to do this idk
// introduced at runtime by the abstract machine
// lexical scope for stack
| Shift(serializeRTExpr) // handles let. pushes/pops a single binding to/from current frame
| Hole
*/

function rtexprGlyph() {
return (e) => {
switch (e.TAG) {
case 0: return value()(e._0);
case 1: return M.text({contents: e._0, fontSize: "18px"});
/* would kind of like to be able to write a function here and then immediately apply it? GF.mk(...)(e). So that means that I want GF.mk to return a function from data to Glyph, which it currently doesn't do. */
case 2: return createGroup({
"{": M.text({contents: "{", fontSize: "18px"}),
"}": M.text({contents: "}", fontSize: "18px"}),
$arg$: arg => M.text({contents: `(${arg}) =>`, fontSize: "18px"}),
$body$: body => rcexprGlyph()(body),
},
{
"arg->{": [C.alignCenterY, C.hSpace(5)],
"{->body": [C.alignCenterY, C.hSpace(5)],
"body->}": [C.alignCenterY, C.hSpace(5)],
}
)({arg: e._0, body: e._1});
case 3: return createGroup({
$fn$: (fn) => parens(rtexprGlyph()(fn)),
$arg$: (arg) => parens(rtexprGlyph()(arg)),
},
{
"fn->arg": [C.alignCenterY, C.hSpace(5)],
}
)({fn: e._0, arg: e._1});
case 4: return M.text({contents: "Let"});
case 5: return M.text({contents: "Match"});
case 6: return M.text({contents: "Dup"});
case 7: return createGroup({
"dec": M.text({contents: "dec", fontSize: "18px"}),
";": M.text({contents: ";", fontSize: "18px"}),
$x$: x => parens(M.text({contents: x, fontSize: "18px"})),
$e$: rtexprGlyph(),
},
{
"dec->x": [C.alignCenterY, C.hSpace(0)],
"x->;": [C.alignCenterY, C.hSpace(0)],
";->e": [C.alignCenterY, C.hSpace(5)],
}
)({x: e._0, e: e._1});
case 8: return createGroup({
"dec": M.text({contents: "dec", fontSize: "18px"}),
";": M.text({contents: ";", fontSize: "18px"}),
// "x": x => parens(M.text({contents: `@${x}`, fontSize: "18px"})),
$x$: x => parens(pointer()({_0: x})),
$e$: rtexprGlyph(),
},
{
"dec->x": [C.alignCenterY, C.hSpace(0)],
"x->;": [C.alignCenterY, C.hSpace(0)],
";->e": [C.alignCenterY, C.hSpace(5)],
}
)({x: e._0, e: e._1});
case 9: return M.text({contents: "Shift"});
case 10: return M.text({contents: "Hole"});
}
}
}
Insert cell
render(data.trace[0].frames[0].zipper.focus, rtexprGlyph())
Insert cell
/*
type ectxt =
| AppL(rtexpr) -- E e
| AppR(pointer) -- x E
| Let(id, rtexpr) -- let x = E; e
| Match(list<(pat, rtexpr)>) -- match E { pi -> ei }
| Shift
| Frame
*/

function zipper() {
return (e) => {
return e.ectxts.reduce((s, e) => {
switch (e.TAG) {
case 0: return createGroup({
$fn$: (fn) => box(parens(fn)),
$arg$: (arg) => parens(rtexprGlyph()(arg)),
},
{
"fn->arg": [C.alignCenterY, C.hSpace(5)],
}
)({fn: s, arg: e._0});
case 1: return createGroup({
$fn$: (fn) => parens(pointer()(fn)),
$arg$: (arg) => box(parens(arg)),
},
{
"fn->arg": [C.alignCenterY, C.hSpace(5)],
}
)({fn: e, arg: s});
default: return M.text({contents: "NYI"});
}
}
, rtexprGlyph()(e.focus))
}
}
Insert cell
render(data.trace[0].frames[0].zipper, zipper())
Insert cell
render(data.trace[5], config())
Insert cell
/*

TODO: need to hoist the pointers above the stack, because the stack is trying to close over the heap, which isn't what we want

*/

function stack() {
return createGroup({
// renderFn: M.debug,
// TODO: this is a really gross hack that somehow gets the box to render properly. without invisible text it breaks!!
"text": M.text({contents: "stack", fontSize: "0px"}),
"box": M.rect({fill: "none", stroke: "black"}),
/* TODO: switch back to this! */
// "elements": createShapeFn({
// // bbox: {
// // left: Math.random() * 10,
// // top: Math.random() * 10,
// // },
// shapes: {
// "arrow": M.arrow({ stroke: "cornflowerblue", strokeWidth: 3 }),
// // 2: M.nil(),
// // "hRef": M.nil(),
// // "hRef": /* M.text({contents: `${hRef ? hRef.path : 'null'}`, fontsize: "18px"}), */hRef,
// },
// fields: {
// "x": (x) => M.text({contents: x, fontSize: "18px"}),
// "v": value(),
// },
// rels: {
// "x->v": [C.hSpace(5), C.alignCenterY],
// "v->arrow/start": [C.alignCenter, C.alignMiddle],
// "arrow/end->hRef": [C.alignCenter, C.alignMiddle],
// }
// }),
// "elements": createShapeFn({
// // object: (x) => M.text({contents: `${5}`})
// shapes: {
// "foo": M.nil(),
// },
// }),
// "elements": (x) => M.text({contents: `${Object.keys(x)}`}),
$elements$: /* ({x, v, hRef }) => */createGroup({
// bbox: {
// left: Math.random() * 10,
// top: Math.random() * 10,
// },
$x$: (x) => M.text({contents: x, fontSize: "18px"}),
$v$: (v) => value()(v),
// "arrow": M.arrow({ stroke: "cornflowerblue", strokeWidth: 3 }),
// 2: M.nil(),
// "hRef": /* M.text({contents: `${hRef ? hRef.path : 'null'}`, fontsize: "18px"}), */hRef,
},
{
"x->v": [C.hSpace(5), C.alignCenterY],
// "v->arrow/start": [C.alignCenter, C.alignMiddle],
// "arrow/end->hRef": [C.alignLeft, C.alignMiddle],
}
),
$neighbors$: createGroup({
$curr$: 'ref',
$next$: 'ref',
},
{ "curr->next": [C.alignLeft, C.vSpace(5)] }
)
},
{
"text->elements": [C.alignTop, C.alignRight],
// "box->elements": C.contains,
"box->elements": [C.sameHeight, C.sameWidth, C.alignTop, C.alignLeft],
}
)
}
Insert cell
render(data.trace[0].frames[0].stack, stack())
Insert cell
function heap() {
return createGroup({
// renderFn: M.debug,
"text": M.text({contents: "heap", fontSize: "18px"}),
$elements$: ([rc, hv]) =>
rc === 0 ? M.nil() :
createGroup({
// bbox: {
// left: Math.random() * 10,
// top: Math.random() * 10,
// },
"rc": M.text({contents: rc.toString(), fontSize: "12px"}),
"heapValue": heapValue()(hv),
},
{
"rc->heapValue": [C.hSpace(5), C.alignCenterY]
}
),
$neighbors$: createGroup({
$curr$: 'ref',
$next$: 'ref',
},
{ "curr->next": [C.alignLeft, C.vSpace(5)] }
)
},
{ "text->elements": [C.alignLeft, C.vSpace(5)] }
)
}
Insert cell
render(data.trace[1].heap, heap())
Insert cell
function frame() {
return createGroup({
$zipper$: zipper(),
$stack$: stack(),
},
{ "stack->zipper": [C.alignLeft, C.vSpace(5)] }
)
}
Insert cell
render(data.trace[0].frames[0], frame())
Insert cell
function config() {
return createGroup({
$frames$: frame(),
$heap$: heap(),
$pointers$: createGroup({
// renderFn: M.debug,
$src$: 'ref',
$tgt$: 'ref',
"arrow": M.line({ stroke: "cornflowerblue", strokeWidth: 3 }),
// "rect": M.rect({width: 20, height: 20, fill: "magenta" }),
// "circle": M.circle({r: 10, fill: "blue" }),
},
{
"src->arrow": [C.makeEqual('right', 'left'), C.makeEqual('centerY', 'top')],
"arrow->tgt": [C.makeEqual('right', 'left'), C.makeEqual('bottom', 'centerY')],
// "src->rect": [C.alignCenter, C.alignMiddle],
// "tgt->circle": [C.alignCenter, C.alignMiddle],
}
)
},
{ "frames->heap": [C.alignTop, C.hSpace(40)] }
)
}
Insert cell
viewof traceNum = Inputs.range([0, data.trace.length - 1], {step: 1, label: "traceNum"})
Insert cell
render(data.trace[traceNum], config())
Insert cell
Insert cell
Insert cell
Insert cell
C = bfjs.constraints
Insert cell
ref = (...args) => {
const path = args
.flatMap(s =>
s.replaceAll('[', '/elements/')
.split(new RegExp('/|\\]')))
.filter((x) => x)
.map(s => s === ".." ? -1 : s);

return {
$ref: true,
path,
}
}
Insert cell
Insert cell
Insert cell
mkList = (xs) => ({
elements: xs,
neighbors: xs.length < 2 ? [] :
_.zipWith(_.range(xs.length - 1), _.range(1, xs.length), (curr, next) => (
{
curr: ref(`../../elements/${curr}`),
next: ref(`../../elements/${next}`),
}
))
})
Insert cell
createGroup = (shapes, rels={}, options={}) => bfjs.createShape({shapes, rels, ...options})
Insert cell
// PL scopes
function scopeResolve(data, candidates=[]) {
if (Array.isArray(data)) {
candidates = [new Set([...Array(data.length).keys()]), ...candidates];
return data.map((d) => scopeResolve(d, candidates));
} else if (typeof data === 'object' && '$ref' in data) {
let path = data.path;
if (path[0] === -1) return data;
const numScopes = lookup(path[0], candidates);
path.unshift(...new Array(numScopes).fill(-1));
console.log(path)
path = path.map((s) => s === -1 ? '..' : s);
return ref(...path);
} else if (typeof data === 'object' && data !== null) {
candidates = [new Set(Object.keys(data)), ...candidates];
return Object.fromEntries(Object.entries(data).map(([k, v]) => [k, scopeResolve(v, candidates)]))
} else {
return data
}
}
Insert cell
function lookup(name, candidates, numScopes=0) {
const [hd, ...tl] = candidates;
if (hd.has(name)) {
return numScopes;
} else {
if (tl.length === 0) throw `${name} not found`;
return lookup(name, tl, numScopes+1);
}
}
Insert cell
createListGroup = (elementShape, relations, options={}) => createGroup({
$elements$: elementShape,
$neighbors$: createGroup({
$curr$: 'ref',
$next$: 'ref',
}, relations)
}, {}, options)
Insert cell
text = (contents, options={}) => M.text({ contents, ...options})
Insert cell
function addLists(data) {
if (Array.isArray(data)) {
return mkList(data.map((d) => addLists(d)));
} else if (typeof data === 'object' && '$ref' in data) {
return data;
} else if (typeof data === 'object' && data !== null) {
return Object.fromEntries(Object.entries(data).map(([k, v]) => [k, addLists(v)]))
} else {
return data
}
}
Insert cell

Purpose-built for displays of data

Observable is your go-to platform for exploring data and creating expressive data visualizations. Use reactive JavaScript notebooks for prototyping and a collaborative canvas for visual data exploration and dashboard creation.
Learn more