Re: In which cases/problems is Prolog faster than Java?



Paulo Moura wrote:

On May 23, 1:51 pm, Cameron Hughes <cahug...@xxxxxxxxxx> wrote:

...
I don't know what level of programming and algorithm analysis skills Christina has, I'm just a little
squeemish about switching a language and paradigm just to shave seconds off of a sort.
Carl Hoare's 1960's quick sort is starting to wear thin in the year 2008 (please don't take that statement wrong)
But we can and must do better. At the risk of opening a classic 'can of worms' Is there something
we can do with our new found parallelism (i.e., 2,4, and 8 processor multicore chips) to improve the Qsort?
The overhead for parallelism can be prohibitive for small datasets, but once we get to sorting datasets that have a size
where we actually care about the performance of Java qsort vs. Prolog qsort then threading etc., starts to pay for itself.

So what about parallel implementations of qsort for Prolog?

Here is one in Logtalk (which you can run with SWI-Prolog, YAP, and
XSB as a back-end compiler when using Logtalk high-level multi-
threading features):

:- object(qsort(_Threads)).

:- info([
version is 1.21,
author is 'Paul Crocker and Paulo Moura',
date is 2007/12/27,
comment is 'Multi-threaded version of the quick sort algorithm.',
parameters is ['Threads'- 'Number of threads to use in sorting.
Valid values are 1, 2, 4, 8, 16, 32, etc.']]).

:- threaded.

:- public(qsort/2).
:- mode(qsort(+list, -list), one).
:- info(qsort/2, [
comment is 'Sorts a list of terms into ascending order.',
argnames is ['List', 'Sorted']]).

qsort(List, Sorted) :-
parameter(1, Threads),
Threads > 0,
qsort(List, [], Sorted, Threads).

qsort([], Sorted, Sorted, _).
qsort([Pivot| Rest], Acc, Sorted, Threads) :-
( Threads =:= 1 ->
quicksort([Pivot| Rest], Acc, Sorted)
; Threads2 is Threads//2,
partition(Rest, Pivot, Smaller0, Bigger0),
threaded((
qsort(Smaller0, [Pivot| Bigger], Sorted, Threads2),
qsort(Bigger0, Acc, Bigger, Threads2)
))
).

partition([], _, [], []).
partition([X| Xs], Pivot, Smalls, Bigs) :-
( X < Pivot ->
Smalls = [X| Rest],
partition(Xs, Pivot, Rest, Bigs)
; Bigs = [X| Rest],
partition(Xs, Pivot, Smalls, Rest)
).

quicksort([], Sorted, Sorted).
quicksort([Pivot| Rest], Acc, Sorted) :-
partition(Rest, Pivot, Smaller0, Bigger0),
quicksort(Smaller0, [Pivot| Bigger], Sorted),
quicksort(Bigger0, Acc, Bigger).

:- end_object.

The problem with speeding up Quicksort using multiple threads is that
the actual performance very much depends on the pivot chosen at each
iteration. I.e. if the pivot does not lead to sub-partitions roughly
the same size you get little more performance compared with a
sequential version. E.g. using SWI-Prolog as the back-end compiler on
a 8-core computer with *random* lists of floats gives (using 1, 2, 4,
8, and 16 threads):

Quicksort (average of 10 runs)
1 2 4 8 16
5000 0.047746 0.047059 0.035708 0.029006 0.032277
10000 0.111083 0.100361 0.076050 0.077118 0.083268
15000 0.167924 0.174797 0.109953 0.100317 0.091422
20000 0.226519 0.212724 0.139554 0.136808 0.134318
25000 0.328326 0.337592 0.339155 0.272412 0.251659
30000 0.396432 0.384252 0.363177 0.364878 0.259189
35000 0.420288 0.246681 0.206172 0.209500 0.182987
40000 0.490231 0.338510 0.249211 0.177443 0.186040
45000 0.588948 0.575981 0.417951 0.325616 0.268653
50000 0.682585 0.698429 0.587631 0.488556 0.374016

Actual speedup also depends on the performance of the Prolog compiler.
E.g. with YAP you will need bigger lists than above to hope to beat
the sequential version (you always need to account for the threads
overhead plus the overhead of all those temporary lists being built).
Other sorting algorithms such as merge sort allow better speedups when
using multiple cores. E.g.:

Merge sort (average of 10 runs)
1 2 4 8 16
5000 0.049974 0.037345 0.027060 0.026583 0.037963
10000 0.146841 0.081894 0.054119 0.048932 0.061557
15000 0.232520 0.128032 0.086671 0.076746 0.096394
20000 0.317598 0.176163 0.119017 0.102286 0.122714
25000 0.419169 0.230886 0.148339 0.129592 0.156422
30000 0.519234 0.281993 0.184366 0.157323 0.185261
35000 0.617644 0.335898 0.216543 0.183829 0.220019
40000 0.714777 0.390772 0.249395 0.214107 0.254999
45000 0.812895 0.447561 0.284706 0.243894 0.283623
50000 0.917376 0.517273 0.322977 0.275071 0.318263

Thus, hard to beat a highly optimized C implementation of a Prolog
sorting built-in predicate for most cases.

Cheers,

Paulo

Now we're talkin.... This would have been the direction that I would've pointed Christina in.
Yes as the size of the data increases the thread overhead is very minimal in comparison to speedup.
And this is just one p-prolog implementation of quick_sort. Also once the switch is made to
a parallel version of the sort, its been my experience that the code does not change nearly as
much for Prolog as it does when switching from sequential C++ or java to parallel versions in C++ or java.
(i.e. learning curve is somewhat smaller for Prolog implementations!)

Oh Yeah!!



.



Relevant Pages

  • Re: In which cases/problems is Prolog faster than Java?
    ... The overhead for parallelism can be prohibitive for small datasets, but once we get to sorting datasets that have a size ... comment is 'Multi-threaded version of the quick sort algorithm.', ... partition(Rest, Pivot, Smaller0, Bigger0), ... qsort(Bigger0, Acc, Bigger, Threads2) ...
    (comp.lang.prolog)
  • Re: Pivot Table sort by count of crashes 2008 but not 2004
    ... I then perform a pivot table report and do a sort on one of the columns. ... descending on the count of that column, Excel 08 always crashes. ...
    (microsoft.public.mac.office.excel)
  • Re: using DSUM in formulae instead of SUMIF
    ... I have a spreadsheet that I enter the pertinent customer contact data into ... Is it possible to somehow sort as the above example illustrates, ... Will the SUMIF function accept IF types of arguments ... > If the pivot table source data will change size frequently, ...
    (microsoft.public.excel.worksheet.functions)
  • Re: Pivot Table Report formatting - cant select Data Source Order
    ... For pivot tables built from Excel data, the items are listed in Ascending order when the pivot table is created. ... If you don't want to sort the items manually, you could create a custom list, and base the sort order on that. ... Select the Custom Lists tab ... It's importing them in ascending order by default, and I can't select Data Source Order from the toolbar's Sort and Top 10/AutoSort options. ...
    (microsoft.public.excel.misc)
  • Re: Variable number of multiple rows per record; want summary by record
    ... It looks like the sort of thing a pivot table could handle. ... Joe 8 11 ... The new sheet will have one row per person, ...
    (microsoft.public.mac.office.excel)