Re: fast method accessing large, simple structured data



agc wrote:
Hi,

I'm looking for a fast way of accessing some simple (structured) data.

The data is like this:
Approx 6 - 10 GB simple XML files with the only elements
I really care about are the <title> and <article> ones.

So what I'm hoping to do is put this data in a format so
that I can access it as fast as possible for a given request
(http request, Python web server) that specifies just the title,
and I return the article content.

Is there some good format that is optimized for search for
just 1 attribute (title) and then returning the corresponding article?

I've thought about putting this data in a SQLite database because
from what I know SQLite has very fast reads (no network latency, etc)
but not as fast writes, which is fine because I probably wont be doing
much writing (I wont ever care about the speed of any writes).

So is a database the way to go, or is there some other,
more specialized format that would be better?


"Database" without any further qualification indicates exact matching, which doesn't seem to be very practical in the context of titles of articles. There is an enormous body of literature on inexact/fuzzy matching, and lots of deployed applications -- it's not a Python-related question, really.

.



Relevant Pages

  • Re: fast method accessing large, simple structured data
    ... So what I'm hoping to do is put this data in a format so ... I've thought about putting this data in a SQLite database because ... much writing (I wont ever care about the speed of any writes). ...
    (comp.lang.python)
  • fast method accessing large, simple structured data
    ... So what I'm hoping to do is put this data in a format so ... (http request, Python web server) ... I've thought about putting this data in a SQLite database because ... much writing (I wont ever care about the speed of any writes). ...
    (comp.lang.python)
  • Re: Security Issues
    ... and password and I'll look after the format c for you;-). ... >>> network guy wont let it go through. ... i know that some of you dont care, ... >>> know the username and password for administrator access, ...
    (microsoft.public.windows.server.sbs)
  • Re: How do I separate numbers?
    ... first part "385" to allow me to do calculations with, ... query, it wont let me just format it as a fraction and times it by 1700.. ...
    (microsoft.public.excel.misc)