Re: Switch from SBCL to Erlang backend due to scalability issues(GC).

Wade Humeniuk wrote:
Matthew Swank wrote:
This seems to making the rounds; though Lisp is still used as a source
language then compiled to Erlang.

From the developers comments ....

The "benchmark code" is difficult to isolate. It's not like we're computing fibbonaci sequences here... there has to be a stream of events coming in from real-world users. SBCL starts to use an insane amount of memory, and the footprint grows even though the set of reachable objects (essentially we use the interal functions (room) calls to walk the objects, and the output of (room) itself) doesn't get much bigger. It's been a while since we've run any numbers on this problem so I don't have any handy.

Clearly something changed with the garbage collector in more recent versions of SBCL -- now SBCL uses 100% CPU all the time in production and lasts much longer before finally bombing due to exhaustion of its pre-committed space.

And we aren't using OS-level threads at all.
By a1k0n at Mon, 2007-03-05 17:39 | login or register to post comments

On the assumption that everyone is acting rationally and intelligently,
and staring at the SBCL source for a while... it feels like the problem
is related with the usual culprits in C code.

- Stack overflow, though there seems to be a some sort of guard region
in the stack its unlikely that there is any hardware protection to
stop problems. Also signals are running on the stack. At least in the
new code, pthreads are created with limited memory (I think 1 MB).
In system running loaded perhaps there is a potential to overflow (maybe
even a signal overrunning the stack). If the code is compiled with
(safety 0) I assume SBCL will not do any guard checking. a1k0n
says they do not use pthreads, are all the threads using the main process
stack? A bunch of lisp threads all sharing the stack and overruning?
(seems unlikely).

- Heap corruption due to running unsafe code. Clobber the heap and the collector
goes who knows where, or even loses references to other objects.

So, if anyone is listening, run everything with (safety 3) (debug 3) and
rebuild SBCL with bigger stacks for threads. Try again.. and again and

I assume they are running on Linux. Is that right?