Re: Ordered dict by default



That page about Ruby dicts show a higher traversal speed (probably
just because the CPU has to scan less memory, but I am not sure,
because mordern CPUs are very complex) but lower insertion speed (I
think mostly not because the added management of two pointers, but
because the memory used increases, so there are more cache misses. If
this turns out as true, then using the xor trick may halve the extra
memory needed). I have no idea if such different balance of
performance will lead to on average faster or slower Python programs,
this has to be tested. But even if the average performance becomes a
little worse I think making the default Python dict as ordered is a
positive change for Python too, because built-ins are meant to be as
flexible as possible, even if they aren't the fastest ones or the less
memory hungry ones (and keeping dicts ordered decreases the surprises
a newbie finds, makes some code cleaner, and doctests become simpler
to write because the order of items may be more defined).

Er, doing a cursory review of my code has an overwhelmingly higher incidence
of insertions then traversals with my dictionaries: many dictionaries are
never traversed and when they are it is only once, and often had tens to
thousands of insertions before that traversal happens.

So a slight decrease in insertion speed at the cost of faster traversal seems
like a complete loss to me.

Now, I also do recognize the utility of ordered dictionaries in some cases, but
exactly what you mean by ordered varies. I have two cases where "ordered"
has the keys are in a specific custom order. I have four cases where "ordered"
means maintaining insertion order. For the former I do sorted(dict) on access
when I need to do something order-related. For the latter I have a simple
ordered dictionary implementation that maintains the keys as a separate list
attached to the dictionary.

But interestingly enough to me, in all the cases where I use an ordered dict
in my code, performance is completely irrelevant. Its tens to hundreds of
items usually, and the best optimization in the world wouldn't add more then
an imperceptible fraction of a second, and the memory lost to the extra
list adds up to nothing relevant.

I'm not claiming my use cases are everyones, not by any means. But: this
seems like a complete lose-lose for an idea from my perspective. Sure you
say an unordered dictionary could then be added for my needs, but... I like
my syntactic sugar, thank you :)

Ordered dictionaries can be useful, its good to see a standard implementation
going into the core, but the default? I hope not!

--Stephen
.



Relevant Pages

  • Re: "Parallel.For GC problems" and a solution.
    ... I think Dictionary(of dictionary(of small array of ints)) ... with the dictionaries holding arrays of keys, ... do not pass it a large array of things to process. ... With no breaks, memory goes crazy. ...
    (microsoft.public.dotnet.languages.vb)
  • Re: Writing huge Sets() to disk
    ... >> state and taking say 600MB, pushes it's internal dictionaries ... that this code takes a lot of extra memory. ... >> I believe it's the references problem, ... the swapspace reserved grows during that posted loop. ...
    (comp.lang.python)
  • Re: Writing huge Sets() to disk
    ... > references in python. ... > state and taking say 600MB, pushes it's internal dictionaries ... Almost anything you do copies references. ... that this code takes a lot of extra memory. ...
    (comp.lang.python)
  • Re: Parsing String, Dictionary Lookups, Writing to Database Table
    ... values to build an array of rows inserted into a sqlite3 database table. ... the values are determined from two dictionaries. ... comma-and-quote values for insertion in the database table? ...
    (comp.lang.python)
  • Re: what is the best datatype for..
    ... dictionaries etc will work, ... If I understand your suggestion correctly, you are proposing creating an array that will be scanned searching for the index value, at which point the count can be updated. ... Furthermore, for large data sets where the difference in memory footprint is likely to be of concern, the performance difference will be phenomenal. ...
    (microsoft.public.dotnet.languages.csharp)

Loading