Re: "PriorityMap"



Eric Sosman wrote:
Andreas Leitgeb wrote:
I'm in search of a data structure for the followig usecase:

Some algorithm spits out *lots* of pairs (k,v) of type (int,long).
I only want either those ten(*) pairs with the lowest keys, or all
pairs with the lowest ten(*) keys. (whichever is easier to achieve)
The keys (k) are not necessarily unique, in fact I usually get a
couple of different values (v) for each k, and I'm interested only
in the v's, that came with those ten(*) smallest k's. The actual
k's do not matter afterwards.

There are much too many to save them all and filter afterwards.

PriorityQueue and TreeMap do not seem to make it easy to efficiently prune them beyond the first ten(*) entries after each add().

Since it's called in a long loop, efficiency is an issue, but the algorithm itself is also non-trivial, so I can afford some
extra cpu-cycles.

(*): I'd prefer if the "ten" does not need to be hardwired.

Since you have "*lots*" of (k,v) pairs, it would probably
be nice to avoid the overhead of creating an Integer and a Long
and a Map.Entry (or similar) for each pair. I'd suggest just
coding the thing up for the purpose at hand -- a simple heap
made out of an int[] and a long[] seems attractive. Should be
pretty fast, too: Once you've been through a few thousand pairs,
most of the remaining pairs will be bigger than the heap's root
and will be rejected on the first comparison.


Personally, I'd try three approaches, and benchmark to pick the best:

a) add every result to a SortedMap (or sorted set) and then do an iteration to find the first ten.
b) Create a Pair class, keep a List of all the Pairs. Sort and then get a sublist
c) Create a List, as you add things, utilize Collections.binarySearch, and insert into the corresponding location, truncating the list if it becomes too long.

All three of these approaches have different merits, depending on the structure and amount of data. If I had to bet before testing, I'd put my money on (c). but it is also the most complicated to code (though not by much at all).

I'd be curious to see your results.

--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
.



Relevant Pages

  • Re: "PriorityMap"
    ... Some algorithm spits out *lots* of pairs of type. ... I only want either those tenpairs with the lowest keys, ... and will be rejected on the first comparison. ...
    (comp.lang.java.programmer)
  • Re: "PriorityMap"
    ... Some algorithm spits out *lots* of pairs of type. ... I only want either those tenpairs with the lowest keys, ... k's do not matter afterwards. ...
    (comp.lang.java.programmer)
  • Re: "PriorityMap"
    ... Some algorithm spits out *lots* of pairs of type. ... I only want either those tenpairs with the lowest keys, ... I'd use a PriorityQueue but with either a Comparable element type or a ...
    (comp.lang.java.programmer)
  • "PriorityMap"
    ... Some algorithm spits out *lots* of pairs of type. ... I only want either those tenpairs with the lowest keys, ... k's do not matter afterwards. ... efficiently prune them beyond the first tenentries ...
    (comp.lang.java.programmer)