Re: Mergesort Vs Quicksort



Robert Maas, http://tinyurl.com/uh3t wrote:

Actually my algorithm with two auxilary/temporary arrays is pretty simple:
Each recursive call takes these parameters:
- Starting (inclusive) index of sub-array to sort
- Ending (exclusive) index of sub-array to sort
- Pointer to array containing the actual data in this range to sort. - Pointer to first auxilary array which might contain valid data
outside this range but is garbage within this range.
- Pointer to second auxilary array which might contain valid data
outside this range but is garbage within this range.
Return value is pointer to whichever of the three arrays happens to
actually contain this particular sub-array after sorting.

Algorithm: Find the halfway point among the records in this sub-array,
thereby dividing this sub-array into two smaller sub-arrays
[Start,Mid) and [Mid,End)
Recurse on each, passing the two auxilary arrays as-is each time.
Upon return, determine which of the three arrays contains each of
the two sorted results. If they are different, merge them into the
third array that currently contains garbage over the full sub-array
[Start,end), else pick at random one of the two arrays that *each*
contains garbage to merge into.
Return pointer to whatever array the merge was to.

My fastest algorithm uses two callback functions:
one for comparing the records and
one for returning the length of a record.

In ten times the amount of time that it takes
to return the size of every record:
I stable sorted by stringlength,
216098 variable length strings
which were packed into an 8000000 byte array.

This is the prototype for the sorting function in C code:

#include <stddef.h>

int rec_sort(void *base,
size_t nrec,
size_t (*rec_size)(const void *),
int (*compar)(const void *, const void *));

Three sorts were done.
The first was done with a compar function named "ncmp",
that always returned zero.
The second was done with a compar function named "scmp",
that sorted alphabetically.
The result of that sort was then sorted
with a compar function named "lcmp", which sorted by string length.
The stable sort resulted in a sequence in which same length strings
were in alphabetic order.


Statistics:

Within the definition of rec_sort(),
the lines of code which contained function calls
were found to take all of the measurable amounts of time.

Allocated memory usage:
ptrs = malloc(nrec * sizeof(char *));
buff = malloc(nrec / 2 * sizeof(char *));
free(buff);
buff = malloc(array_size);
free(ptrs);
free(buff);
array_size = 8000000 bytes
nrec = 216098 records
nrec + 1 = 216099
nrec * 2 = 432196
nrec * log2(nrec) / 2 = 1914771
nrec * log2(nrec) = 3829543
Number of
function Calls
malloc() = 3
free() = 3
memcpy() = 216099
recsize() = 432196
ptr_merge( ) = 1
(ncmp) = 1878269
(scmp) = 3560041
(lcmp) = 3534118
Time consumed
by function calls
malloc(), free() = 0.02 seconds
memcpy() = 0.09 seconds
recsize() = 0.22 seconds
malloc(), free(), memcpy(), recsize() = 0.33 seconds

ptr_merge(ncmp) = 0.03 seconds
ptr_merge(scmp) = 0.37 seconds
ptr_merge(lcmp) = 0.80 seconds

malloc(), free(), memcpy(), recsize(), ptr_merge(ncmp) = 0.36 seconds
malloc(), free(), memcpy(), recsize(), ptr_merge(scmp) = 0.70 seconds
malloc(), free(), memcpy(), recsize(), ptr_merge(lcmp) = 1.13 seconds

The time used by recsize
is the time that it takes to size each record twice.
The time used by memcpy
is the time that it takes to move each record twice:
once, one at a time,
and again by copying the whole array with one memcpy call.

This is the output from the rec_sort_test:

/* BEGIN output from rec_sort_test.c */

A contiguous block of strings remains in the original order.
The comparison function always returned 0 to rec_sort():
cmp_ctr is 1878269
0.360000 seconds.

A contiguous block of strings has been sorted alphabetically by rec_sort():
cmp_ctr is 3560041
0.703000 seconds.

A contiguous block of strings has been sorted by string length by rec_sort():
cmp_ctr is 3534118
1.125000 seconds.

The array was 8000000 bytes in size, and the array contained 216098 records,
each ranging anywhere from 2 to 72 bytes in size.

/* END output from rec_sort_test.c */

The test program is here:
http://www.mindspring.com/~pfilandr/C/rec_sort/rec_sort_test.c

The rec_sort function is here:
http://www.mindspring.com/~pfilandr/C/rec_sort/rec_sort.c
http://www.mindspring.com/~pfilandr/C/rec_sort/rec_sort.h

--
pete
.