How does emacs mark the stack for GC

Question

Since I tried creating my own Lisp interpreter (just for learning purposes, as I knew it wouldn't get too far), I have read and tried to understand the internal workings of the Elisp interpreter and always wondered how Emacs does its garbage collection.

I know that the main idea in a Lisp interpreter is that every object that is not reachable from the stack or is the variable or function value of a symbol (basically accessible from the obarray), is considered garbage and is collected when new memory is needed. But what happens when the only reference to an object is in the stack?

How does Emacs know which block of data corresponds to a Lisp object? After all, they are mixed with the complete C/computer stack with function return addresses and C variables not necessarily used inside the Elisp side of Emacs.

After reading the Emacs source file alloc.c, I understand this is done through a tree which holds the address of all allocated memory blocks, and that should in theory solve the problem, but what happens when by pure chance a block of memory stores a valid Elisp object even when it is not supposed to be one? So, could someone please explain to me how Emacs marks/uses the C stack for Elisp function calls?

There are a few Emacs C developers who read this forum, but Eli Z. does not (as far as I am aware). And, there are some old-timers who are also very knowledgeable who read this forum. It may behoove you to "Submit a link" to this question on https://www.reddit.com/r/emacs/new/ so that you gain access to Eli Z. and a few other heavy-hitters. — lawlist, Jan 03 '18 at 20:30

score 2 · Answer 1 · answered Jan 04 '18 at 05:07

but what happens when by pure chance a block of memory stores a valid Elisp object even when it is not supposed to be one?

Emacs conservatively marks the stack, which means that it examines each word on the stack and considers whether it "looks like" a Lisp object. There is a comment in alloc.c that explains the general approach:

/* Conservative C stack marking requires a method to identify possibly live Lisp objects given a pointer value. We do this by keeping track of blocks of Lisp data that are allocated in a red-black tree (see also the comment of mem_node which is the type of nodes in that tree). Function lisp_malloc adds information for an allocated block to the red-black tree with calls to mem_insert, and function lisp_free removes it with mem_delete. Functions live_string_p etc call mem_find to lookup information about a given pointer in the tree, and use that to determine if the pointer points to a Lisp object or not. */

Conservative marking generally has two weaknesses.

One is what you have pointed out: the GC can accidentally keep data alive. For example if Emacs happens to compute some integer value that is also a valid object reference, then that object mayb be kept alive unnecessarily. The Emacs response to this is to do nothing. I suppose it is unlikely to be a big problem in practice, though you could maybe make it happen by playing tricks with bit vectors.

Another weakness is that sometimes compiler optimizations can make it so that there are no obvious references to an object which is live. As far as I know, Emacs doesn't try to do anything about this problem, either.

`> Another weakness is that sometimes compiler optimizations can make it so that there are no obvious references to an object which is live. As far as I know, Emacs doesn't try to do anything about this problem, either.` This looks like a serious problem. Do we have any bug reports because of this? I think it would be hard to go from the bug to the root cause if this happens. — narendraj9, Dec 06 '21 at 12:01

How does emacs mark the stack for GC

1 Answers1