Re: fast method accessing large, simple structured data
- From: John Machin <sjmachin@xxxxxxxxxxx>
- Date: Sat, 02 Feb 2008 21:50:47 GMT
agc wrote:
Hi,
I'm looking for a fast way of accessing some simple (structured) data.
The data is like this:
Approx 6 - 10 GB simple XML files with the only elements
I really care about are the <title> and <article> ones.
So what I'm hoping to do is put this data in a format so
that I can access it as fast as possible for a given request
(http request, Python web server) that specifies just the title,
and I return the article content.
Is there some good format that is optimized for search for
just 1 attribute (title) and then returning the corresponding article?
I've thought about putting this data in a SQLite database because
from what I know SQLite has very fast reads (no network latency, etc)
but not as fast writes, which is fine because I probably wont be doing
much writing (I wont ever care about the speed of any writes).
So is a database the way to go, or is there some other,
more specialized format that would be better?
"Database" without any further qualification indicates exact matching, which doesn't seem to be very practical in the context of titles of articles. There is an enormous body of literature on inexact/fuzzy matching, and lots of deployed applications -- it's not a Python-related question, really.
.
- Follow-Ups:
- References:
- Prev by Date: Re: dict comprehension
- Next by Date: Re: fast method accessing large, simple structured data
- Previous by thread: Re: fast method accessing large, simple structured data
- Next by thread: Re: fast method accessing large, simple structured data
- Index(es):
Relevant Pages
|