Populating a dictionary, fast




The id2name.txt file is an index of primary keys to strings. They look like this:

11293102971459182412:Descriptive unique name for this record\n
950918240981208142:Another name for another record\n

The file's properties are:

# wc -l id2name.txt

8191180 id2name.txt
# du -h id2name.txt
517M id2name.txt

I'm loading the file into memory with code like this:

id2name = {}
for line in iter(open('id2name.txt').readline,''):
id,name = line.strip().split(':')
id = long(id)
id2name[id] = name

This takes about 45 *minutes*

If I comment out the last line in the loop body it takes only about 30 _seconds_ to run.
This would seem to implicate the line id2name[id] = name as being excruciatingly slow.

Is there a fast, functionally equivalent way of doing this?

(Yes, I really do need this cached. No, an RDBMS or disk-based hash is not fast enough.)



.



Relevant Pages

  • Re: To store a huge table during start-up of a J2EE application
    ... where I will create two strings. ... if your primary keys and index are setup ... a database query should not run for more than tens of ... dont see the value of losing the ...
    (comp.lang.java.programmer)
  • RE: question of performance
    ... I've always wondered why *anyone* would recommend using strings as primary ... I realize they can make primary keys more "readable" if built up like ... If you're comparing millions of keys to build a ... compared in one CMP operation (one op to move the base integer to a register, ...
    (microsoft.public.access.queries)
  • Re: Does Codds view of a relational database differ from that ofDate&Darwin?[M.Gittens]
    ... > Marshall Spight wrote: ... > you to define certains sets of strings described by regular expression ... > as strings, and is user-extensible. ... > Another small thing is updating primary keys. ...
    (comp.databases.theory)
  • Re: To store a huge table during start-up of a J2EE application
    ... Ravi Shankar Nair wrote: ... this:- Assume that first four columns, col1...col4 are primary keys out of the total 12 columns. ... My search or lookup is always based on the primary key, ... So I will have the startup init method in one of the servlets, where I will create two strings. ...
    (comp.lang.java.programmer)