Re: optimized string storage for phrases/dictionary



dagny_orearden@xxxxxxxxx wrote:
> I'm investigating what the optimal data structure for storage of
> phrases that recur multiple times but with small variances. The goal is
> to use up as little memory as possible, be able to look up the original
> string, return it and allow the caller to use it without regard to how
> it's stored internally.
> Some of the sample data might look like this:
> hello_there/here_is_very_long_string/abcdefghijklmnopq
> hello_there/here_is_very_long_string/abcdefghijklmnopr
> hello_there/here_is_very_long_string/abcdefghijklmnost
> hello_there/here_is_very_long_string/abcdefghijklmnvsd
> hello_there_2/here_is_very_long_string/abcdefghijklmnopq
> hello_there_2/here_is_very_long_string/abcdefghijklmnopr
> hello_there_2/here_is_very_long_string/abcdefghijklmnost
> hello_there_3/here_is_very_long_string/abcdefghijklmnvsd

I would suggest a "trie", but Paul Black already responded
and if he doesn't think it's a trie then I'm not so sure
anymore. (Googling for "trie" brings up Paul's page on the
subject as the first link.)

--Jamie. (a Dover edition designed for years of use!)
andrews .uwo } Merge these two lines to obtain my e-mail address.
@csd .ca } (Unsolicited "bulk" e-mail costs everyone.)
.