Re: optimized string storage for phrases/dictionary
- From: "Nameless" <news.mail@xxxxxxxxx>
- Date: Wed, 27 Apr 2005 04:56:00 +0200
<dagny_orearden@xxxxxxxxx> wrote in message
news:1114533009.892165.299080@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> I'm investigating what the optimal data structure for storage
> of phrases that recur multiple times but with small variances.
> The goal is to use up as little memory as possible, be able
> to look up the original string, return it and allow the caller
> to use it without regard to how it's stored internally.
>
> Some of the sample data might look like this:
> hello_there/here_is_very_long_string/abcdefghijklmnopq
> hello_there/here_is_very_long_string/abcdefghijklmnopr
> hello_there/here_is_very_long_string/abcdefghijklmnost
> hello_there/here_is_very_long_string/abcdefghijklmnvsd
> hello_there_2/here_is_very_long_string/abcdefghijklmnopq
> hello_there_2/here_is_very_long_string/abcdefghijklmnopr
> hello_there_2/here_is_very_long_string/abcdefghijklmnost
> hello_there_3/here_is_very_long_string/abcdefghijklmnvsd
>
> I'll have millions of strings like this, some of them are much
> longer than what's listed above whereas others maybe short.
>
> Typically, if you have the same string occuring multiple times,
> you can use reference counted strings. But here, it's slight
> difference since the strings aren't exactly indentical but just
> slightly variant enough at different places.
> I can split up the strings using "/" as a delineator but in
> terms of the optimal data structure for storage, I'm not sure
> if a tree would be ideal or something else.. I'm sure there's
> been some sort of research done on this problem and probably a
> name for it.
>
> Anyone have any suggestions or ideas?
Ternary tree, perhaps?
---
Mail sent to this email address is deleted unread
on the server. Please send replies to the newsgroup.
.
- References:
- optimized string storage for phrases/dictionary
- From: dagny_orearden
- optimized string storage for phrases/dictionary
- Prev by Date: Re: discuss dancing links
- Next by Date: Reply: Time dependent shortest path
- Previous by thread: Re: optimized string storage for phrases/dictionary
- Next by thread: Re: optimized string storage for phrases/dictionary
- Index(es):
Relevant Pages
|