Re: sort algorithm



"Juha Nieminen" <nospam@xxxxxxxxxxxxxx> wrote in message news:Ykmkk.207$kI6.71@xxxxxxxxxxxxxxxx
Richard Heathfield wrote:
I used bucket-sort. For this application, I heartily recommend it.

Wouldn't a simple counting sort have been much simpler to implement
(and probably faster)? Something like:

size_t counts[101] = { 0 };

for(size_t i = 0; i < 100000000; ++i)
++counts[array[i]];

Why bother with this step:

size_t arrayInd = 0;
for(int cInd = 0; cInd <= 100; ++cInd)
for(size_t counter = 0; counter < counts[cInd]; ++counter)
array[arrayInd++] = cInd;

The first loop generates all the information that you need. To output the sorted list, just dump the counts array (in a manner pretty well idential to your assigments loop, but without bothering to perform the assignments). I guess that the compact representation would be more useful for most purposes, considering the limited nature of the data.

The speed of this depends heavily on the actual integral type used in
the array. For example in my computer (3.4GHz Pentium4) sorting an array
of ints like this takes about 400ms, but if it's an array of chars
instead, it takes only 170ms.

#include <stdlib.h>
#include <stdio.h>
#include <time.h>

static int array[100000000];

static const double clock_inv = 1.0 / CLOCKS_PER_SEC;

int main(void)
{
size_t counts[101] = {0};
size_t a_dim = sizeof array / sizeof array[0];
size_t i;
clock_t start,
end,
save;
size_t arrayInd = 0;
int cInd;
size_t counter;

srand(time(NULL));
for (i = 0; i < 100000000; ++i) {
array[i] = rand() % 101;
}
save = start = clock();
for (i = 0; i < 100000000; ++i)
++counts[array[i]];
end = clock();
printf("distribution counting into bins took: %f seconds\n", (end - start) * clock_inv);
start = clock();
for (cInd = 0; cInd <= 100; ++cInd)
for (counter = 0; counter < counts[cInd]; ++counter)
array[arrayInd++] = cInd;
end = clock();
printf("distribution movement into array took: %f seconds\n", (end - start) * clock_inv);
printf("counting and movement into array took: %f seconds\n", (end - save) * clock_inv);
return 0;
}
/*
C:\tmp\test\x64\Release>test
distribution counting into bins took: 0.109000 seconds
distribution movement into array took: 0.125000 seconds
counting and movement into array took: 0.234000 seconds

C:\tmp\test\x64\Release>test
distribution counting into bins took: 0.109000 seconds
distribution movement into array took: 0.125000 seconds
counting and movement into array took: 0.234000 seconds

C:\tmp\test\x64\Release>test
distribution counting into bins took: 0.109000 seconds
distribution movement into array took: 0.125000 seconds
counting and movement into array took: 0.234000 seconds

C:\tmp\test\x64\Release>test
distribution counting into bins took: 0.109000 seconds
distribution movement into array took: 0.125000 seconds
counting and movement into array took: 0.234000 seconds
*/

** Posted from http://www.teranews.com **
.



Relevant Pages

  • Re: The inaugural VB6 vs dot net test
    ... If you do Command1 first, ... Command2 operates on an array that already has been ... way than counting up loops. ...
    (microsoft.public.vb.general.discussion)
  • Re: Using size_t clearly (appropriately?)
    ... a justification for using 'size_t' to represent array indices. ... designed to count the number of elements in an array of struct foobar. ... In general case 'size_t' is not applicable for counting objects at all. ... In general case it's range is not sufficient (16-bit platform again). ...
    (comp.lang.c)
  • Re: Using size_t clearly (appropriately?)
    ... a justification for using 'size_t' to represent array indices. ... designed to count the number of elements in an array of struct foobar. ... size_t has *at least* enough range for the purpose. ... If the language had a type to be used generically for counting ...
    (comp.lang.c)
  • Re: 0 based collections
    ... Visual Basic has in my opinion taken the right approach that counting start ... Zero other than Null. ... However, almost every collection and array in VBNet starts at Zero, with the ...
    (microsoft.public.dotnet.languages.vb)
  • Re: FREQUENCY function clarification
    ... Many rows had the same store (575 possible stores). ... The SSN is in the “general” number format, ... The data_array is the array of numbers you want to analyze. ... equals the number of bins +1. ...
    (microsoft.public.excel.misc)