Re: Mergesort Vs Quicksort



From: pete <pfil...@xxxxxxxxxxxxxx>
I would do it with a linked list.

You're ignoring the ground rules of this contest: Your data is
given as variable-length records packed end-to-end in one array,
and the data can be parsed only from the left end towards the
right, and you're limited in the quantity of extra storage you
allocate temporarily to run your algorithm, which must perform
stable-sort. The final result must be those same records, packed
back into that same array, but now in ascending sequence, and in
case of ties the left-most of any two originals must also be
left-most among final location of those two records, which makes no
difference if you're sorting the whole records, but *does* make a
difference if the keys being sorted are not the whole records.

Also, we'd like to maintain good locality of reference, so that if
the whole array is much larger than physical RAM, and the records
are initially in random sequence, at no point during the process
will virtual memory thrash because of references jumping all over
virtual address space at random.

Building a linked list violates locality of reference, and it is
not guaranteed that it will fit in the half-sized aux array if you
happen to have a very large number of very small records.

If your dataset is large enough that you can't allocate two full
sized arrays to use my fastest algorithm, then you are way past the
physical RAM size, and hence random-access of memory implied by
liked-list merge-sort will surely thrash your swapping VM like mad.

If your physical dataset is small enough that the whole thing fits
in physical RAM, so that linked-list won't thrash, then you have
oodles of extra virtual memory you can allocate virtually for free,
so you might as well pre-process the entire dataset to include the
length at the front of each record so that skipping a record later
doesn't require any parsing, or pre-process the entire dataset to
build a linked list, it doesn't matter which way you do it, either
way requires parsing *every* record once before you even start
sorting, and then you never have to parse that record again when
simply moving or sliding, although you will typically need to parse
at least part of each record being compared in order to locate the
logical parts of the key. And if the key isn't just an initial
segment of the reecord, your preprocessing either way should also
include moving the key explicitly to the front, immediately after
the explicit length.

Original record layout: 222222.......1111.3333.44444444.555555.
Sample record data: Robert_Elton_Maas_Math_Computer_Lonely.
(In record layout, numbers show parts of key in logical sequence,
dots show non-key data to be ignored during sorting.)
Reformatted record data: Maas_Robert_Math_Computer_Lonely|Elton.
End of key, start of non-sorting data, here------------------>|
With length added,
and final delimiter removed: 002AMaas_Robert_Math_Computer_Lonely|Elton
(28 is hexadecimal for 40, the total number of bytes including length)
(Two hexChars are one byte in the length, one USASCII per byte in text.)
With linked list, you still need to re-arrange the key to the
front, but you won't need to put the length at the very front.

Let's say you *do* want to pre-process each record to move the key
to the front. Which takes longer?
- Also putting the length in the front, and packing all the
resultant length+key+rest into one super-sized array?
- Allocating each key+rest as a separate object on the heap, and
also allocating each CONS cell on the heap, and linking CONS
cells in a chain, and linking each CONS cell to the
corresponding key+rest record?

Later, after sorting is done, you need to re-arrange each
key+record back to original sequence as you pack them back into the
original array as return value to your caller. But you have to also
do additional clean-up work. Which takes longer?
- Deallocating that one super-sized array.
- Individually deallocating each of the key+rest records and also
individually deallocating each of the CONS cells?
.



Relevant Pages

  • Windows array allocation problem
    ... I'm getting an "insufficient virtual memory" error when trying to allocate an array to a size which is about 385 MB less than the amount of available virtual memory. ...
    (comp.lang.fortran)
  • Re: Windows array allocation problem
    ... allocate an array to a size which is about 385 MB less than the amount of available virtual memory. ... In both cases a call to Windows API function GlobalMemoryStatus or GlobalMemoryStatusEx immediately before the ALLOCATE statement returns a value of available virtual memory of 2,118,221,824 bytes. ... So I'm getting an "insufficient virtual memory" failure when trying to allocate the array to a size which is 384.665 MB less than the amount of reported available virtual memory. ...
    (comp.lang.fortran)
  • Re: Windows array allocation problem
    ... allocate an array to a size which is about 385 MB less than the amount of available virtual memory. ... In both cases a call to Windows API function GlobalMemoryStatus or GlobalMemoryStatusEx immediately before the ALLOCATE statement returns a value of available virtual memory of 2,118,221,824 bytes. ... So I'm getting an "insufficient virtual memory" failure when trying to allocate the array to a size which is 384.665 MB less than the amount of reported available virtual memory. ...
    (comp.lang.fortran)
  • Re: Windows array allocation problem
    ... allocated to the first array. ... fragmenting the virtual memory space during startup before any code is ... the largest array I could allocate was approximately 2.084 ...
    (comp.lang.fortran)
  • Re: Cons cell archaic!?
    ... from s-expression or XML or other syntax you keep the bloated array ... For using vectors to emulate lists that ... Allocate 2, move 1 element: ... What do you think of that algorithm? ...
    (comp.lang.lisp)