Re: Memory manager: I'm porting SmartHeap to Delphi

From: Eric Grange (egrange_at_glscene.org)
Date: 01/30/04


Date: Fri, 30 Jan 2004 09:56:27 +0100

Besides the late allocation issue Ignacio mentionned, there is also the
not-so-negligible issue of cache coherency.

For instance if you have to allocate memory for 100 records in an array
(only once), then perform some number crunching or sorting on those
records, there can be a huge performance difference depending on where
in the memory these records have been allocated: if they were tightly
fit together, they can all be available in CPU cache, which if they are
scattered over several megabytes, the CPU cache will be useless and
you'll be limited to RAM speed, whose clock speed is 5 to 10 times slower
than the CPU and has extra latencies of its own.

My very first attempts at a MM produced something that destroyed the default
manager in allocation/deallocation, but when used in a real world case,
actually degraded performance (because of that scattering). A good MM must
not only allocate/release fast, but also ensure that what is allocated is
"cache friendly" most of the time, otherwise you'll waste CPU cycles waiting
for read/write accesses to the RAM when you use the allocated memory.

Also, a common assumption in MM benchmarks is that allocations are random,
but in practice, they are *far* from being random. Each program has its own
specific patterns (which makes it hard to come with a "standard benchmark"),
but on all the proggies I've investigated, 80 to 90% of the memory allocations
were made for a dozen or so block sizes (typically the most-used-objects sizes,
the default string sizes, etc.), which were having quite "regular" lifecycles
(f.i. "gets reallocated twice on average, up to N kb then freed, rinse and repeat").

Being able to exploit these patterns automatically is in the "future plans",
some experiments with manual tweaking of the allocator showed that there are
fair boosts waiting to be harvested when the allocator is aware of those things
for all proggies whose data doesn't fit in the CPU cache.
Of course, these can also be achieved by optimizing the code, but that often
means making the code more complex and hard to maintain, while having it handled
by a memory allocator can be free, maintenance-wise.

Eric



Relevant Pages

  • Re: Active Memory Defragmentation: Our implementation & problems
    ... Low memory kernel pages are a much bigger deal to defrag. ... We've thought about having many more allocator pools around to help ease ... you go to free a file's page cache, you get some pretty large contiguous ...
    (Linux-Kernel)
  • Re: [PATCH] change gen_pool allocator to not touch managed memory
    ... The following patch modifies the gen_pool allocator to ... change is to eliminate the touching of the actual memory being allocated. ... + * Add a new chunk of memory to the specified pool. ... starting address of memory chunk to add to pool ...
    (Linux-Kernel)
  • [PATCH] [0/13] General DMA zone rework
    ... Traditionally it was designed only for ISA dma which is limited to ... On 32bit i386 with its limited virtual memory space the next zone is ... cannot actually be used for ISA or any other device with <32bit DMA mask. ... I chose to implement a new "maskable memory" allocator to solve these ...
    (Linux-Kernel)
  • Re: [RFC] genalloc != generic DEVICE memory allocator
    ... > Andrey> idea of which is a cute nice thing. ... > Keep in mind that genalloc was meant to be simple for basic memory ... generic allocator for both short-live strictly aligned blocks and ... So far I don't see any reason for changing to ...
    (Linux-Kernel)
  • Re: Q: HP-UX 11.31 slow performance without load
    ... Before the patch the kernel used to hang or crash. ... Which means that we're likely to have physical memory -- just not the ... Being inode related... ... It suggests you caused some memory to get flushed down to the allocator ...
    (comp.sys.hp.hpux)