Re: Low-level, Efficient Ternary Search Trees for Storing Strings?
- From: Rainer Joswig <joswig@xxxxxxx>
- Date: Thu, 29 Mar 2007 23:37:53 +0100
In article <joswig-1E03F4.23345629032007@xxxxxxxxxxxxxxxxxxxxxxxx>,
Rainer Joswig <joswig@xxxxxxx> wrote:
In article <1175202616.126251.134340@xxxxxxxxxxxxxxxxxxxxxxxxxxx>,
"dpapathanasiou" <denis.papathanasiou@xxxxxxxxx> wrote:
After reading Bentley and Sedgewick's paper on using Ternary Search
Trees for storing strings (http://www.ddj.com/dept/windows/184410528),
I wrote an implementation in CL, using a simple char array as the trie
structure.
It works, and searching is fast, but the resulting tries (arrays) are
massive, especially when saved to disk.
Then I saw this post about packing multiple datapoints into a single
object -- fixnum, in his case, since he's dealing with integers
(http://groups.google.com/group/comp.lang.lisp/browse_thread/thread/
f5b364a601c86987), and I was wondering if there's any similar low-
level, efficient structure that can be used to replace the char array
in my code?
Here's my code:
(defpackage :trie
(:use
:common-lisp)
(:export
:make-trie
:insert-trie
:search-trie)
(:documentation
"Library of ternary trees, inspired by Bentley & Sedgewick
article."))
(in-package :trie)
(defun make-trie ()
"Return a trie object, an array, defined as:
0 = split-char
1 = lo-kid
2 = eq-kid
3 = hi-kid"
(make-array '(4) :element-type 'char :initial-contents '(nil nil nil
nil)))
That's not right. You have an array of characters.
NIL is not a character. You can say element type is
character and then initialize the elements to NIL
(which is not a character).
I mean "you can't say.."
Plus:
Element 0 is a character. What are elements 1, 2 and 3?
They are others TRIEs. Not characters. You
can use a general ARRAY, but not a character ARRAY.
Use a structure instead of an array. See below.
(defun get-trie-val (trie index)
"Access the trie value defined by index:
0 = split-char
1 = lo-kid
2 = eq-kid
3 = hi-kid"
(aref trie index))
(defun get-split-char (trie)
(get-trie-val trie 0))
(defun get-lo-kid (trie)
(get-trie-val trie 1))
(defun get-eq-kid (trie)
(get-trie-val trie 2))
(defun get-hi-kid (trie)
(get-trie-val trie 3))
(defun set-trie-val (trie index val)
"Set the trie value defined by index:
0 = split-char
1 = lo-kid
2 = eq-kid
3 = hi-kid"
(setf (aref trie index) val))
SETF is better used to update data. You would
also want to write your interfaces to provide SETF
functions. You can do it with (defun (setf ... )...).
But usully you get it with DEFSTRUCT for free. See below.
(defun set-split-char (trie val)
(set-trie-val trie 0 val))
(defun set-lo-kid (trie val)
(set-trie-val trie 1 val))
(defun set-eq-kid (trie val)
(set-trie-val trie 2 val))
(defun set-hi-kid (trie val)
(set-trie-val trie 3 val))
A remark: all this boilerplate code could be replaced
with a single DEFSCTRUCT
(defstruct trie ...)
Then you'd get a structure object when you call MAKE-TREE.
Note that you can also use DEFSTRUCT when you want
an array as the underlying datastructure. There is a :TYPE
option to DEFSTRUCT where you can specify that.
You could also use DEFCLASS if you'd like the flexibility of CLOS
classes.
(defun trie-empty (trie)
"A trie is empty if its split-char value is undefined."
(null (get-split-char trie)))
(defun insert-trie (trie str)
"Add the string to the trie."
(when (> (length str) 0)
(let ((c (char-downcase (char str 0))))
(when (trie-empty trie)
(setf trie (make-trie))
(set-split-char trie c)
(set-lo-kid trie (make-trie))
(set-eq-kid trie (make-trie))
(set-hi-kid trie (make-trie)))
(cond ((char< c (get-split-char trie))
(set-lo-kid trie (insert-trie (get-lo-kid trie) str)))
((char= c (get-split-char trie))
(set-eq-kid trie (insert-trie (get-eq-kid trie) (subseq
str 1))))
(t
(set-hi-kid trie (insert-trie (get-hi-kid trie) str))))))
trie)
(defun search-trie (trie str)
"Return t if the string exists in the trie, nil if it does not."
(if (> (length str) 0)
(let ((c (char-downcase (char str 0))))
(cond ((null (get-split-char trie))
nil) ; run off end w/o match
((char< c (get-split-char trie))
(search-trie (get-lo-kid trie) str))
((char> c (get-split-char trie))
(search-trie (get-hi-kid trie) str))
(t
(search-trie (get-eq-kid trie) (subseq str 1)))))
t))
--
http://lispm.dyndns.org
.
- References:
- Low-level, Efficient Ternary Search Trees for Storing Strings?
- From: dpapathanasiou
- Re: Low-level, Efficient Ternary Search Trees for Storing Strings?
- From: Rainer Joswig
- Low-level, Efficient Ternary Search Trees for Storing Strings?
- Prev by Date: Re: Low-level, Efficient Ternary Search Trees for Storing Strings?
- Next by Date: Re: lexical scoping
- Previous by thread: Re: Low-level, Efficient Ternary Search Trees for Storing Strings?
- Next by thread: Re: Low-level, Efficient Ternary Search Trees for Storing Strings?
- Index(es):
Relevant Pages
|
|