Re: "index" efficiency... any help or ideas?

From: Bx. C (null_at_the.void)
Date: 12/15/03


Date: Mon, 15 Dec 2003 09:36:56 -0600


> This is an n-ary tree as opposed to a binary tree. There's lots of
> literature out there on the performance of this technique. It's normally
> used where there is a requirement to have not only high speed indexing,
but
> the ability to insert new keys in the table and keep the search fairly
well
> balanced (hence linear in time for any key) without requiring
> reorganisation. It's particulalry well suited to disk based tables. This
> appears to be a requirement you don't have.

oops.. i forgot to mention that too.. i need to be able to add to it, and
keep the same performance... but the reason i'm saying that it has to be
hard-coded into the program, is because file system(s) aren't accessible
from within the program... only before and after loading... (i can't be too
specific about that at this time, though.. sorry)

> >
> > I find this way is better than checking the first dword of the command
and
> > doing a right side or left side drop, because too many of the commands
are
> 3
> > characters... almost exhausting the whole list of 3 character
> > combinations... though there are just as many with 4 characters...
also...
> > the space is used at the end of the commands because many of these
> commands
> > take parameters, so there will be a space in the input anyway, to
separate
> > the command from its parameters...
> >
> > additionally.. the commands are not used in equal amounts... there are
> many
> > commands that are used quite more often than others... thus, I realize I
> > will have to ultimately sort at least the top level (and maybe a few
> > succeeding levels) for better efficiency... but because of this reason,
a
> > "higher" or "lower" value can't be relied upon for narrowing where you
> > are... (also, space efficiency is a desired component, though it may
have
> to
> > be thrown out, depending on how good a performance increase there can
> be)...
>
> That's still a lot of checking; with a good hash, you're looking at the
hash
> itself, the link to the list head from the hash table and 1 compare
> optimally. With a little care on the hash algorithm and a large enough
hash
> table, 90%+ of the searches can be made to perform as well as this. There
> may be a few pathological cases where the depth is greater than 1; a trial
> pre-hash to check would point these out (that is, those cases where DOG
may
> have the same hash as MAD). The building of such a hash table is trivial
in
> comparison to the n-ary tree technique.

in my current version of my n-ary tree... here's how i have it set up... two
structures... one for the nodes (or as i like to call them, the "knots")...
and 1 for a list of pointers...

knot_base:
1 byte number of possibilities for character
x bytes string of possible characters, x=# of possib chars
(repeat last two lines as necessary)

pointer_base:
1 word 0 ...so we don't have to waste a cycle on DEC DI after SCASB
1 word 0 ...placeholder for 1-byte length in knot_base
x words list of pointers... x = # of possib. chars
(repeat last two lines as necessary)

the word-sized pointers have two meanings, depending on what the matched
character is... if the matched character is a space, the pointer is the
absolute jump address to the handler for that command... if the matched
character is not a space, then the pointer is an offset relative to
knot_base, pointing to the next knot (next knot = possibilities for next
character)... repeat as necessary...

to determine which pointer cooresponds to which character at each knot, i
just keep indexing the knots relative to the knot_base... using SCASB to
scan the character list... (verify the match... cx could've reached 0 with
no match)... take the knot_base index position, shift left once, and use the
new number as the index into the pointer_base.... (it's lined up
correctly... i've checked)

but....
...now i'm trying to decide whether "x dwords" in the list of knots, would
be better... doing SCASD.. and then after a match, just checking to see if
the last character of the match was a space... (that would mean extending
everything to a size multiple of 4 bytes)... but, that also means i'd have
to do a little processing on each 4 bytes taken in from the command... check
to see if any byte is a space, and if so, make the remaining characters a
space to match (or instead of spaces, since i have to pre-process the user
input before checking the list, i could change all my spaces to 0 instead...
doesn't really matter) i'm curious on how much of a performance increase
that would be... yet also curious to see how much more wasted space i'd
have... (oh, and since my checks would be a dword at a time, i'd probably
increase the pointer size to dwords... and take out the shift left in the
above paragraph..)

hmmm... well.. my current next step, in either case... is to make a program
that'll make my tree structure from my list...

> >
> > i hope i've explained the issue a little better, and ask if everyone
still
> > feels their own tree method could still work for my strange needs in
this
> > case...
> >
> > thanks in advance,
> > Bx.C / Shiragajin
> >
> > PS. (36 characters in the alphabet? oh my!! hehe... that was
hilarious...)
> > PPS. (sorry.. i'm kinda rushed right now so i didn't have the time to
put
> > that one in its own message, where it belonged.. hehe)
>
> Nathan was very good humoured about his mistake I thought. It certainly
gave
> me a chuckle.
>

same here

Bx.C



Relevant Pages

  • Re: How to I print without newline ?
    ... ftp usually supports the command 'hash' ... when you use it it displays that character every X bytes. ... but the print is drop a newline for every percent. ...
    (comp.lang.python)
  • vi editor FAQ (Frequently Asked Question List), Part 2/2
    ... has the UCB distribution of vi, and lots of useful macros. ... m0 is the ex command to move the line to line 0. ... Swap character and one vertically above: ... A non-visual editor under Unix. ...
    (comp.unix.questions)
  • vi editor FAQ (Frequently Asked Question List), Part 2/2
    ... has the UCB distribution of vi, and lots of useful macros. ... m0 is the ex command to move the line to line 0. ... Swap character and one vertically above: ... A non-visual editor under Unix. ...
    (comp.unix.questions)
  • vi editor FAQ (Frequently Asked Question List), Part 2/2
    ... has the UCB distribution of vi, and lots of useful macros. ... m0 is the ex command to move the line to line 0. ... Swap character and one vertically above: ... A non-visual editor under Unix. ...
    (comp.unix.questions)
  • vi editor FAQ (Frequently Asked Question List), Part 2/2
    ... has the UCB distribution of vi, and lots of useful macros. ... m0 is the ex command to move the line to line 0. ... Swap character and one vertically above: ... A non-visual editor under Unix. ...
    (comp.unix.questions)

Loading