Re: Low-level, Efficient Ternary Search Trees for Storing Strings?



On Mar 30, 2:56 pm, "dpapathanasiou" <denis.papathanas...@xxxxxxxxx>
wrote:
Point taken about the array definition; I should *not* use :element-
type 'char since only :split-char is a char and all the :*-kid
components are themselves trie objects, not chars.

But even if I switch from an array to a defstruct, I still wind up
with a very bulky S-expression.

Using this (which, as you say is better, since it offers built-in
make, access and setting functions):

(defstruct trie
split-char
lo-kid
eq-kid
hi-kid)

If I create a trie to hold a two-letter string, e.g. "as", I get this
S-expression:

#S(TRIE
:SPLIT-CHAR #\a
:LO-KID #S(TRIE :SPLIT-CHAR NIL :LO-KID NIL :EQ-KID NIL :HI-KID
NIL)
:EQ-KID #S(TRIE
:SPLIT-CHAR #\s
:LO-KID #S(TRIE
:SPLIT-CHAR NIL
:LO-KID NIL
:EQ-KID NIL
:HI-KID NIL)
:EQ-KID #S(TRIE
:SPLIT-CHAR NIL
:LO-KID NIL
:EQ-KID NIL
:HI-KID NIL)
:HI-KID #S(TRIE
:SPLIT-CHAR NIL
:LO-KID NIL
:EQ-KID NIL
:HI-KID NIL))
:HI-KID #S(TRIE :SPLIT-CHAR NIL :LO-KID NIL :EQ-KID NIL :HI-KID
NIL))

Serialized to disk, that's 866 bytes.

Versus just 2 bytes (4 counting the double-quotes to denote it as a
string object) if I write "as" just like that.

So what I was getting at was: is there any way of packing the trie
data into a low-level, compact, efficient data structure, similar to
what the other post did for multiple integers into fixnums?

I'd like to be able to use tries for searching, but if I'm stuck
serializing such large S-expressions, I'd be better off saving them as
raw strings and using a fast library like cl-ppcre instead.

How about storing it as #S(as) and make the reader turn that into a
trie structure?

Also, are the NIL-only parts really necessary? Could your example be
reduced to the following?

#S(TRIE
:SPLIT-CHAR #\a
:LO-KID NIL
:EQ-KID #S(TRIE
:SPLIT-CHAR #\s
:LO-KID NIL
:EQ-KID NIL
:HI-KID NIL)
:HI-KID NIL)

.



Relevant Pages