Re: Cons cell archaic!?



From: Scott <xsco...@xxxxxxxxx>
There are better strategies than you list (pun not originally
intended :-).

Yeah, sometimes puns, and also cliches, strike when we least expect.

You can realloc a vector as you go, allocating some extra slots
each time in anticipation of future appends. If you bump up the
size exponentially (making it a constant multiple larger at each
realloc), the amortized cost for appending to a vector is only
O(1). Moreover, if you choose a factor less than 2 (say 1.5 or
something) for your reallocations, you'll use less memory than a
linked list when you're finished.

Hmm, so when you are all done parsing a level of list structure
from s-expression or XML or other syntax you keep the bloated array
instead of contracting it to exactly-correct size? Unfortunately if
your factor is strictly between 1 and 2 then when splitting a
recycled old array to use part of it for a new not-yet-large array
you suffer storage fragmentation. It might be better to use a
factor of exactly 2 so that both halves of a split large array can
be directly used for a previous-size array that appears later in
the progression up the size scale. (The fibonacci progression is
another well-known algorithm, which doesn't have an exactly fixed
factor, but otherwise satisfies both your general description and a
desire to avoid excessive fragmentation. Of course it would need to
be exact fibonacci times whatever size block is the proper
alignment for the system you're working on, rather than exact
fibonacci number of bytes. But exact factor of 2 works better with
virtual memory, never splitting an array across a page boundary
unless that array is larger than a page, simultaneously working
well with several nested sizes of cache blocks from CPU to RAM to
disk sector.)

If you're memory allocator is smart, you might occasionally get a
larger vector without any copying.

True, once in a blue moon.

Finally, traversing a vector will cause less cache grief than
traversing a linked list.

That efficiency consideration really depends on the particular
application you're running. For using vectors to emulate lists that
are used for binary or multinary search trees, where typically you
look at only a few items at each level of the tree before jumping
down to a sub-tree, it wouldn't help much. For mapping a function
down an array such as in vector processing, it might help a lot.
For toplevel read-JIT-execute-print loop in Lisp, it's moot because
the entire nested-list of code is generated once as output from
READ and then read once during JIT compilation and then sits idle
ever after except during debugging when somebody needs to see the
code that triggered the bugtrap.

Here's another idea to try: Binary-chained CDR-coding where the
block sizes are always exact powers of two. Here's the sequence of
adding one unit at a time at the end of growing list:
Allocate 2:
1 [q/cdr]->NIL
2 [q/w]
Allocate 2, move 1 element:
3 [q/cdr]->[w/e]
Allocate 4, discard 2+2, move 3 elements:
4 [q/w/e/r]
Allocate 2, move 1 element:
5 [q/w/e/cdr]->[r/t]
Allocate 2, move 1 element:
6 [q/w/e/cdr]->[r/cdr]->[t/y]
Allocate 4, move 3 elements, discard 2+2:
7 [q/w/e/cdr]->[r/t/y/u]
Allocate 8, move 6 elements, discard 4+4:
8 [q/w/e/r/t/y/u/i]
Allocate 2, move 1 element:
9 [q/w/e/r/t/y/u/cdr]->[i/o]
Allocate 2, move 1 element:
10 [q/w/e/r/t/y/u/cdr]->[i/cdr]->[o/p]
Allocate 4, move 3 elements, discard 2+2:
11 [q/w/e/r/t/y/u/cdr]->[i/o/p/a]
Allocate 2, move 1 element:
12 [q/w/e/r/t/y/u/cdr]->[i/o/p/cdr]->[a/s]
Allocate 2, move 1 element:
13 [q/w/e/r/t/y/u/cdr]->[i/o/p/cdr]->[a/cdr]->[s/d]
Allocate 4, move 3 elements, discard 2+2:
14 [q/w/e/r/t/y/u/cdr]->[i/o/p/cdr]->[a/s/d/f]
Allocate 8, move 7 elements, discard 4+4:
15 [q/w/e/r/t/y/u/cdr]->[i/o/p/a/s/d/f/g]
Allocate 16, move 15 elements, discard 8+8:
16 [q/w/e/r/t/y/u/i/o/p/a/s/d/f/g/h]
Allocate 2, move 1 element:
17 [q/w/e/r/t/y/u/i/o/p/a/s/d/f/g/cdr]->[h/j]
etc.
What do you think of that algorithm? It requires only a single
CDR/ELEMENT flag for the whole array, any size block, which tells
how the last array cell will be interpreted. All other array cells
are *always* treated as elements (i.e. CARs). Small arrays
discarded when repacking to larger arrays are re-cycled rapidly,
2-arrays almost immediately, 4-arrays after not too long, etc.
So with a reference count system or forced discarding to free-list
this is very efficient storage-wise. Also most of the time only
elements near the end of the list need to be copied from two old
arrays to one new array. Also note the algorithm is nicely
mostly-tail recursive where repacking occurs on the way back up
using only the last two elements of the previous chain.
(Most of algorithm here: If the last two old blocks are the same
size, allocate twice that size and re-pack all old elements from
both old blocks and discard both old blocks. If the last two old
blocks are different size, allocate size 2 and move just one old
element from last old block. The only special cases are at the
very start when creating the very first size-2 array with cdr=NIL
and then the next step when simply replacing cdr=NIL with
element=whatever.)
Note that the stack for keeping track of all cells and allowing
efficient continuation without *ever* needing to scan from the
start of the chain to find the end again during building the chain
requires only log(n) depth worst-case. For efficiency this stack
could be implemented as an ordinary adjustable array with power of
two total capacity and fill pointer to show logical size at any
moment, using your original allocation algorithm with factor=2.
A similar idea could be used to make NTH efficient: Have a 2-d
array where one row has the pointers to the segments of the list
and the other row has the sizes of those segments. Then a single
pass through the second row counting down from the NTH parameter
would find which place to follow the pointer of the first row, and
the remainder just before zero-crossing would show direct index
within that segment of the list. So for a list of length N,
implemented as chain of power-of-two CDR-coded arrays plus that
continuation stack converted to 2-d array, only log(N) steps would
be needed to find any desired element.

By the way, I thought of that new algorithm just tonight while
composing this message. It came out really nice the first time!
I have no idea if anybody else ever thought of this same algorithm
previously.

But see below for slightly bad news:


Now in fact an array of N cells has some extra header words that
make its total allocation size some constant larger than N*k bytes
of memory where k is the size of a single CAR cell.
(Assuming you're not using BIBOP where each page of RAM stores only
one type of data-block of constant size so you don't need any tag
bytes at all to identify type+length of each allocated block.
The above algorithm would work just fine as-is with BIBOP.
Another way that would work is if all the size+tag info is in
each pointer to a block rather than in the block pointed at.)
This would require slight modification of the algorithm to make
sure the *total* allocation unit was power of two. For example, if
the total header (size number plus type-tag code) of an allocation
block takes the same storage as one CAR-or-CDR pointer, and we have
these block types available:
- lista treats all cells after the header as elements
- listd treats last cell as CDR link, all others (after header) as elements
- listx ignores last cell, treats all cells between header...last as elements
then the sequence might go more like this:
Allocate 2:
1 [2#lista/q]
Allocate 4, copy 1 element, release 2:
2 [4#listx/q/w/ ]
Change tag type:
3 [4#lista/q/w/e]
Allocate 4, copy 1 element, change tag type:
4 [4#listd/q/w/ptr]->[4#listx/e/r/ ]
Change tag type:
5 [4#listd/q/w/ptr]->[4#lista/e/r/t]
Allocate 4, copy 1 element, change tag type:
6 [4#listd/q/w/ptr]->[4#listd/e/r/ptr]->[4#listx/t/y/ ]
Allocate 8, copy 6 elements, discard 4+4+4:
7 [8#lista/q/w/e/r/t/y/u]
Allocate 4, copy 1 element, change tag type:
8 [8#lista/q/w/e/r/t/y/ptr]->[4#listx/u/i/ ]
This is sufficiently complicated that I'm not sure this is a clean
algorithm, and it's so far past bedtime that I don't feel like
working it all out tonight. Actually at step 6->7, I could have
just changed the tag type of that last 4-array instead of repacking
already. Then repacking to 8-array would occur at step 7->8.
It's amazing I saw that when so groggy hours past bedtime!!
If this algorithm works at all, it probably will always require
keeping local pointers to the last three arrays in the chain
unlike the original algorithm which required only two pointers.
But still it's almost as nicely mostly-tail recursive if it works
in a uniform way. I'm too groggy to work out how many blocks get
combined at higher levels than 4+4+4->8. G'nite all.
.



Relevant Pages

  • Re: garbage collection problem in large linked lists
    ... Instead all buckets are allocated at once in an array. ... VG.net, which must be very scalable, but only for lists which would normally ... Linked lists require pointer dereferencing in order to traverse. ... When you run out of space, you allocate a new smaller array. ...
    (microsoft.public.dotnet.framework.performance)
  • Storing/Retrieving TYPEs with ALLOCATABLE components (TR) (long)
    ... tBrd, including array descriptor of tEn )). ... Without previous DEALLOCATE, the allocate line fails at run time with message ... the fact that I'm loading an invalid descriptor tBrd%tEn from the file... ... status (which is not possible according to Standard, but then BINARY files ...
    (comp.lang.fortran)
  • Re: Storing the size of an array in the structure itself
    ... >> I think every C programmer can relate to the frustrations that malloc ... >> the size of an array must be stored separately to be a nightmare. ... is anything more than just that - a chunk of memory. ... > Otherwise you couldn't tell it how much to allocate. ...
    (comp.lang.c)
  • Re: Question on Fortran implementation of very large array heaps (optimal minimization)
    ... size of an allocated array. ... Allocate a temp of the ... replace the last copy with a pointer assignment rather ... > arrays (not linked lists), but as the size requirements of the heap ...
    (comp.lang.fortran)
  • determining available space for Float32, for instance
    ... I am looking for a way to determine the maxium array size I can allocate ... We do not want a solution that requires recompiling Python, ... agents may be households that choose a new gridcell to live in. ... Each attribute of a dataset has such a 2D array. ...
    (comp.lang.python)