Re: Merging sorted lists/iterators/generators into one stream of values...



"Alex Martelli" <aleax@xxxxxxxxxxxxxxxx> wrote:

> George Sakkis <gsakkis@xxxxxxxxxxx> wrote:
> ...
> > > manipulation of a heap to place an item in the right spot, but with 4-5
> > > or a few more sources might not make an impact at all.
> >
> > Unless you're talking about hundreds or thousands sources, it probably
> > won't. I would still go for the heap solution since IMO the resulting
> > code it's more readable and easier to understand.
>
> I'm not so sure about either sentence...:
>
> Helen:~/pynut/samp alex$ python merger.py --numstreams=10 --minlen=100
> --how=S
> Best time for 10 loops: 0.247116088867
>
> Helen:~/pynut/samp alex$ python merger.py --numstreams=10 --minlen=100
> --how=H
> Best time for 10 loops: 0.10344004631
>
> i.e., a heap solution may be over 4 times faster than a sort-based one
> (in the following implementations).

Interesting; I thought timsort on small almost ordered lists would be practically as fast as the
heap. Still, how is 0.10344 over 4 times faster than 0.247116 ?

> Readability seems quite comparable
> (skipping the rest of the infrastructure, which generates random sorted
> streams and ensures a stream is exhausted and verifies it etc etc):
>
> def merge_by_sort(streams):
> sources = [[s.next(), i, s.next] for i, s in enumerate(streams)]
> while sources:
> sources.sort(reverse=True)
> best_source = sources[-1]
> yield best_source[0]
> try: best_source[0] = best_source[-1]()
> except StopIteration: sources.pop()
>
> def merge_by_heap(streams):
> sources = [[s.next(), i, s.next] for i, s in enumerate(streams)]
> heapq.heapify(sources)
> while sources:
> best_source = sources[0]
> yield best_source[0]
> try: best_source[0] = best_source[-1]()
> except StopIteration: heapq.heappop(sources)
> else: heapq.heapreplace(sources, best_source)

Indeed, these are almost equivalent as far as readability goes; the previous examples in the thread
were less clear. By the way, why do you need 'i' and enumerate above ?

> Hmmm, I wonder if something like merge_by_heap would be a good candidate
> for itertool. Raymond...?
>
>
> Alex

Yes, it would be nice to make it into 2.5.

George


.



Relevant Pages

  • Does this make sense?
    ... I have a program that runs an array through several loops in a single ... it bugs me that I'm using more variables in the function than I need just ... for readability purposes. ... Is there something analogous to Linux hard links where I could have a line ...
    (comp.lang.c)
  • Re: map shall not return an Enumerator ( was re guru help )
    ... #send_to_elements) but about readability. ... But maybe I am reading too much about Clojure lately;). ... doodling with the block form of an enumerator: ... could be the foundation of the "functional" interface of streams as I ...
    (comp.lang.ruby)
  • Re: Confessions of a Python fanboy
    ... This method can really cleanup some ugly for loops, ... The issue here is of course that `map` and comprehensions are transformations. ... Furthermore and this is the most problematic limitation of Python here, `lambda` doesn't allow complex transformations due to its restrictions, so one has to switch to named functions which works but isn't sexy (and tends to lower readability imo). ...
    (comp.lang.python)
  • Re: Confessions of a Python fanboy
    ... This method can really cleanup some ugly for loops, ... The issue here is of course that `map` and comprehensions are transformations. ... Furthermore and this is the most problematic limitation of Python here, `lambda` doesn't allow complex transformations due to its restrictions, so one has to switch to named functions which works but isn't sexy (and tends to lower readability imo). ...
    (comp.lang.python)
  • Re: *Naming Conventions*
    ... 1/ too much terseness (like too much verbosity) goes against ... Python's forloops are very different... ... comes to readability. ...
    (comp.lang.python)