Re: In which cases/problems is Prolog faster than Java?
- From: Cameron Hughes <cahughes@xxxxxxxxxx>
- Date: Fri, 23 May 2008 12:29:42 -0400
Paulo Moura wrote:
On May 23, 1:51 pm, Cameron Hughes <cahug...@xxxxxxxxxx> wrote:
...
I don't know what level of programming and algorithm analysis skills Christina has, I'm just a little
squeemish about switching a language and paradigm just to shave seconds off of a sort.
Carl Hoare's 1960's quick sort is starting to wear thin in the year 2008 (please don't take that statement wrong)
But we can and must do better. At the risk of opening a classic 'can of worms' Is there something
we can do with our new found parallelism (i.e., 2,4, and 8 processor multicore chips) to improve the Qsort?
The overhead for parallelism can be prohibitive for small datasets, but once we get to sorting datasets that have a size
where we actually care about the performance of Java qsort vs. Prolog qsort then threading etc., starts to pay for itself.
So what about parallel implementations of qsort for Prolog?
Here is one in Logtalk (which you can run with SWI-Prolog, YAP, and
XSB as a back-end compiler when using Logtalk high-level multi-
threading features):
:- object(qsort(_Threads)).
:- info([
version is 1.21,
author is 'Paul Crocker and Paulo Moura',
date is 2007/12/27,
comment is 'Multi-threaded version of the quick sort algorithm.',
parameters is ['Threads'- 'Number of threads to use in sorting.
Valid values are 1, 2, 4, 8, 16, 32, etc.']]).
:- threaded.
:- public(qsort/2).
:- mode(qsort(+list, -list), one).
:- info(qsort/2, [
comment is 'Sorts a list of terms into ascending order.',
argnames is ['List', 'Sorted']]).
qsort(List, Sorted) :-
parameter(1, Threads),
Threads > 0,
qsort(List, [], Sorted, Threads).
qsort([], Sorted, Sorted, _).
qsort([Pivot| Rest], Acc, Sorted, Threads) :-
( Threads =:= 1 ->
quicksort([Pivot| Rest], Acc, Sorted)
; Threads2 is Threads//2,
partition(Rest, Pivot, Smaller0, Bigger0),
threaded((
qsort(Smaller0, [Pivot| Bigger], Sorted, Threads2),
qsort(Bigger0, Acc, Bigger, Threads2)
))
).
partition([], _, [], []).
partition([X| Xs], Pivot, Smalls, Bigs) :-
( X < Pivot ->
Smalls = [X| Rest],
partition(Xs, Pivot, Rest, Bigs)
; Bigs = [X| Rest],
partition(Xs, Pivot, Smalls, Rest)
).
quicksort([], Sorted, Sorted).
quicksort([Pivot| Rest], Acc, Sorted) :-
partition(Rest, Pivot, Smaller0, Bigger0),
quicksort(Smaller0, [Pivot| Bigger], Sorted),
quicksort(Bigger0, Acc, Bigger).
:- end_object.
The problem with speeding up Quicksort using multiple threads is that
the actual performance very much depends on the pivot chosen at each
iteration. I.e. if the pivot does not lead to sub-partitions roughly
the same size you get little more performance compared with a
sequential version. E.g. using SWI-Prolog as the back-end compiler on
a 8-core computer with *random* lists of floats gives (using 1, 2, 4,
8, and 16 threads):
Quicksort (average of 10 runs)
1 2 4 8 16
5000 0.047746 0.047059 0.035708 0.029006 0.032277
10000 0.111083 0.100361 0.076050 0.077118 0.083268
15000 0.167924 0.174797 0.109953 0.100317 0.091422
20000 0.226519 0.212724 0.139554 0.136808 0.134318
25000 0.328326 0.337592 0.339155 0.272412 0.251659
30000 0.396432 0.384252 0.363177 0.364878 0.259189
35000 0.420288 0.246681 0.206172 0.209500 0.182987
40000 0.490231 0.338510 0.249211 0.177443 0.186040
45000 0.588948 0.575981 0.417951 0.325616 0.268653
50000 0.682585 0.698429 0.587631 0.488556 0.374016
Actual speedup also depends on the performance of the Prolog compiler.
E.g. with YAP you will need bigger lists than above to hope to beat
the sequential version (you always need to account for the threads
overhead plus the overhead of all those temporary lists being built).
Other sorting algorithms such as merge sort allow better speedups when
using multiple cores. E.g.:
Merge sort (average of 10 runs)
1 2 4 8 16
5000 0.049974 0.037345 0.027060 0.026583 0.037963
10000 0.146841 0.081894 0.054119 0.048932 0.061557
15000 0.232520 0.128032 0.086671 0.076746 0.096394
20000 0.317598 0.176163 0.119017 0.102286 0.122714
25000 0.419169 0.230886 0.148339 0.129592 0.156422
30000 0.519234 0.281993 0.184366 0.157323 0.185261
35000 0.617644 0.335898 0.216543 0.183829 0.220019
40000 0.714777 0.390772 0.249395 0.214107 0.254999
45000 0.812895 0.447561 0.284706 0.243894 0.283623
50000 0.917376 0.517273 0.322977 0.275071 0.318263
Thus, hard to beat a highly optimized C implementation of a Prolog
sorting built-in predicate for most cases.
Cheers,
Paulo
Now we're talkin.... This would have been the direction that I would've pointed Christina in.
Yes as the size of the data increases the thread overhead is very minimal in comparison to speedup.
And this is just one p-prolog implementation of quick_sort. Also once the switch is made to
a parallel version of the sort, its been my experience that the code does not change nearly as
much for Prolog as it does when switching from sequential C++ or java to parallel versions in C++ or java.
(i.e. learning curve is somewhat smaller for Prolog implementations!)
Oh Yeah!!
.
- References:
- In which cases/problems is Prolog faster than Java?
- From: Christina
- Re: In which cases/problems is Prolog faster than Java?
- From: Cameron Hughes
- Re: In which cases/problems is Prolog faster than Java?
- From: Isaac Gouy
- Re: In which cases/problems is Prolog faster than Java?
- From: Simon Strobl
- Re: In which cases/problems is Prolog faster than Java?
- From: Isaac Gouy
- Re: In which cases/problems is Prolog faster than Java?
- From: Isaac Gouy
- Re: In which cases/problems is Prolog faster than Java?
- From: Paulo Moura
- In which cases/problems is Prolog faster than Java?
- Prev by Date: Re: Networking in Prolog - a survey
- Next by Date: Re: In which cases/problems is Prolog faster than Java?
- Previous by thread: Re: In which cases/problems is Prolog faster than Java?
- Next by thread: Re: In which cases/problems is Prolog faster than Java?
- Index(es):
Relevant Pages
|
|