Re: Running queries on large data structure



Christoph Haas wrote:
On Wednesday 02 August 2006 22:24, Christoph Haas wrote:
I suppose my former posting was too long and concrete. So allow me to try
it in a different way. :)

OK. I'll bite.

The situation is that I have input data that take ~1 minute to parse while
the users need to run queries on that within seconds. I can think of two
ways:

What is the raw data size?
Are there any effciencies to be gained in the parsing code?

(1) Database
(very quick, but the input data is deeply nested and it would be
ugly to convert it into some relational shape for the database)

Depending on your tolerance for this ugliness. You could use a SQLite
'memory' database. _Might_ be faster than the PostgreSQL but you can't
tell until you profile it.

(2) cPickle
(Read the data every now and then, parse it, write the nested Python
data structure into a pickled file. The let the other application
that does the queries unpickle the variable and use it time and
again.)

How hard would it be to create this nested structure? I've found
pickling really large data structures doesn't really save a huge amount
of time when reloading them from disk but YMMV and you would have to
profile it to know for sure.

So the question is: would you rather force the data into a relational
database and write object-relational wrappers around it? Or would you
pickle it and load it later and work on the data? The latter application
is currently a CGI. I'm open to whatever. :)

Convert your CGI to a persistant python webserver (I use CherryPy but
you can pick whatever works for you.) and store the nested data
structure globally. Reload/Reparse as necessary. It saves the
pickle/unpickle step.

In an application I'm working on, I create multiple 'views' off of a
single expensive database query. I tuck all of these views (read as
'deeply nested python structures') into a cache with a expiration time
(currently 5 min in the future). My data layer checks the cache
before doing any queries and uses the appropriate view according to the
request. If the cache hit misses or is expired, I call the expensive
query and reload the cache. This way there is a 'fat' web page every 5
minutes (load time of about 4 seconds on my dev box) and almost every
other page is sub second.

Thanks for any enlightenment.
Just my 2 cents.

....
jay graves

.



Relevant Pages

  • Re: Convert string to command..
    ... database for speed up but eval is very slow for do this. ... I try cache to speed up this select operation.. ... representation of a dictionary to the database, pickle the dictionary ...
    (comp.lang.python)
  • Re: Convert string to command..
    ... 2.2 second only eval operation. ... And you should store a pickle to the database then. ... I try cache to speed up this select operation.. ...
    (comp.lang.python)
  • Re: LDAP Performance (long)
    ... Cache the slapd's internal database lookups in slapd memory. ... The first is the new TAG:key lookup, ...
    (comp.mail.sendmail)
  • Re: Cache-Size vs Performance
    ... logarithmic decrese in the miss rate as the cache size grows ... in big database applications ... ... where the database uses real storage to compensate for disk record ... database people in stl/bldg90 and the relational/sql system/r people ...
    (comp.arch)
  • Re: ASP.Net Caching Questions
    ... As stated, the lookups have to be as fast as possible, so the idea is ... Loading the information from a file or sql column will not ... database will perform comparably to using Cache. ...
    (microsoft.public.dotnet.framework.aspnet)

Loading