Re: Choosing a DataStructure
From: srinivasa kj (srinivasakj_at_yahoo.com)
Date: 11/20/03
- Next message: Arthur J. O'Dwyer: "Re: bitmap (or other) reader source code"
- Previous message: srinivasa kj: "Re: Developpement"
- In reply to: Santosh: "Re: Choosing a DataStructure"
- Next in thread: Santosh: "Re: Choosing a DataStructure"
- Reply: Santosh: "Re: Choosing a DataStructure"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 19 Nov 2003 21:34:24 -0800
Hi,
For large number of entries, Databases use B+ tree indexing, Where
searching and sorting is less laborious(especially when we are
considering huge amount of entries). For small amount of entries,
sometimes databases also uses Hash based indexing. Hope this helps.
Thanks
Srinivasa KJ
santosh_jain99@yahoo.com (Santosh) wrote in message news:<a5b962be.0311190641.623ef49b@posting.google.com>...
> Hello,
>
> The reason I mentioned a billion records was to just drive the point
> that a in-memory sort was not possible. We need to use some kind of a
> external sorting algorithm like mergesort. Also we can safely assume
> the size of a record to be 128 bytes.
>
> The main question that I had was how to store the data ? i.e. whether
> to store it in B-Tree with one of the fields say the name as the key
> and a pointer to the record as another field ? Or do I just use a
> array of pointers to records?
>
> The array of pointers to records seems to be the right because it
> helps in sorting the records with multiple criteria much easily.
>
> I am just curious as to how databases implement this ? I mean we can
> sort the records with multiple fields in ascending or descending order
> or which ever way we like ?
>
> i.e. in a database we can say
>
> Select * From Table1
> ORDER BY column1 ASC, column2 DESC
>
> I am trying to find out how can I implement same functionality without
> using a database but developing my own utility ?
>
> Thanks
>
>
>
>
>
> Richard Heathfield <dontmail@address.co.uk.invalid> wrote in message news:<bpeu83$3sj$1@hercules.btinternet.com>...
> > Santosh wrote:
> >
> > > Hello,
> > >
> > > I would like some input on choosing a datastructure and a algorithm. I
> > > have a text file which contains three strings(say name, phonenumber
> > > and city). The file contains a about a billion records.
> >
> > Even if you mean American billions, that's a lot of records. Alas, you don't
> > tell us how big the records are. A name could be anything up to a 128-byte
> > field (or more!), but let's be charitable and say it's only a very
> > restrictive 32 bytes. The phone number - hmmmm - half the planet is in this
> > database; we're going to have to make that, say, 16 bytes. City? Call it
> > another 16 bytes (and let LA go swing). So - with minimal assumptions,
> > we're looking at approximately 128,000,000,000 bytes. That's a /lot/ of
> > storage.
> >
> > > I need to choose a datastructure which will sort efficienctly based on
> > > any of the strings(keys) which is may be any of the three or a
> > > combination of three in which case we will need to sort with multiple
> > > keys.
> >
> > Well, if you have the RAM for it, just a straight array, and a customised
> > QuickSort. But you don't have the RAM for it, do you? Assuming you have
> > well over 256GB of disk space, you could just about squeeze a merge sort in
> > there. See Knuth TAOCP 3.
> >
> > > What is the best datastructure to store this data?
> >
> > In most cases, there would be several interesting answers. In your case, I
> > think it has to be an array. Anything more complicated will almost
> > certainly impose too much overhead.
> >
> > > the problem here is that the key is not fixed. It could be the name,
> > > phonenumber or the city and sometimes we nmight also need to sort
> > > first by name and then by city.
> >
> > Well, that's all right. You just use several custom sorts, and have
> > different comparisons for each sort. Don't, whatever you do, use qsort, or
> > any other function that uses a callback mechanism for comparisons, since
> > this will involve you in many thousands of millions of function calls. Such
> > luxuries are for those of us with more normally-sized databases.
- Next message: Arthur J. O'Dwyer: "Re: bitmap (or other) reader source code"
- Previous message: srinivasa kj: "Re: Developpement"
- In reply to: Santosh: "Re: Choosing a DataStructure"
- Next in thread: Santosh: "Re: Choosing a DataStructure"
- Reply: Santosh: "Re: Choosing a DataStructure"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|