B+-trees

From: Rico (ras_nas_at_yahoo.com)
Date: 07/29/04


Date: Fri, 30 Jul 2004 04:05:11 +0800

This is not a Java-specific question but it would definitely be meant for
a programmer. So I'm thinking I'm at least half on-topic here.

I'm trying to get a firmer grasp of B+-trees as used for indexing.
I have an input text file with around a thousand lines as follows:

0|Item65| leaguers rebuked realizing southwest haply <....> |81.44|1
1|Item55| Portugal movings grooms backstitching octane <.....> |86.8|3

A unique line number, an item label, a list of words associated with the
item, a price and an inventory level.

        http://cs.ecs.baylor.edu/~speegle/5335/indexes.html

I think that a purely RAM implementation means that I'd have my own node
structures into which I insert the data from the input file.
What if I need to make each node its own file and the pointers are
supposed to be filenames?
Am I still supposed to load the data into node structures or is the point
to just rely on the O/S taking care of loading the page as a disk block?

Also, the specifications just say that each node can have from 50 to 200
pointers. I think it makes sense for non-leaf nodes: with a disk block
size of 4096 bytes, and 20 bytes being enough to store a keyword/pointer
from which I can derive the filename, 200 pointers can fit in.

But I'm not sure whether that variable number of pointers applies to
leaf nodes, or bucket files, as well. After analyzing the data, e.g
highest occurrence of a keyword on distinct lines and longest keyword,
I'm thinking that it doesn't because 50 keywords and the list of line
numbers on which each occurs wouldn't fit in one disk block, let alone
then 200 keywords.

What I would like to know is, am I doing it right?
The quick analysis of the data was done programmatically using
hashtables. Is that "cheating" or is making such assumptions about the
data necessary in designing the B+-tree ?

Thanks.
Rico.



Relevant Pages

  • Re: Symbols for range/iteration/slicing
    ... loop as keyword. ... I would suggest the keyword 'range' ... words I take away from the programmer. ... Do you want C like statements with or without brace, ...
    (comp.lang.misc)
  • Re: Unsigned Indexing (was: some other sad story)
    ... >> possible to make C# unverifiable using the unsafe keyword. ... Clearly, then, it is possible to write insecure code in any language. ... smart pointers, raw arrays or container classes, etc. ...
    (comp.programming)
  • Re: Forward declaring classes
    ... the compiler will prepare for the worst and assume it ... MS specific __keyword I should write there in order ... What you're thinking of applies to pointers to members. ...
    (microsoft.public.vc.language)
  • Re: Large Dictionaries
    ... people thought creating a dict by keyword would be a useful thing to ... Adding the keyword syntax doesn't require much memory to a Python ... programmer because it's the same syntax used elsewhere. ...
    (comp.lang.python)
  • Re: Non static method error explanation?
    ... programmer use...use/add the keyword "static" in the calledRoutine ... structured programming I wasn't thinking objects, ... recursion in using keyword "new" in this example. ... because it's invoked from the constructor. ...
    (comp.lang.java.help)