optimized string storage for phrases/dictionary



I'm investigating what the optimal data structure for storage of
phrases that recur multiple times but with small variances. The goal is
to use up as little memory as possible, be able to look up the original
string, return it and allow the caller to use it without regard to how
it's stored internally.

Some of the sample data might look like this:
hello_there/here_is_very_long_string/abcdefghijklmnopq
hello_there/here_is_very_long_string/abcdefghijklmnopr
hello_there/here_is_very_long_string/abcdefghijklmnost
hello_there/here_is_very_long_string/abcdefghijklmnvsd
hello_there_2/here_is_very_long_string/abcdefghijklmnopq
hello_there_2/here_is_very_long_string/abcdefghijklmnopr
hello_there_2/here_is_very_long_string/abcdefghijklmnost
hello_there_3/here_is_very_long_string/abcdefghijklmnvsd

I'll have millions of strings like this, some of them are much longer
than what's listed above whereas others maybe short.

Typically, if you have the same string occuring multiple times, you can
use reference counted strings. But here, it's slight difference since
the strings aren't exactly indentical but just slightly variant enough
at different places.
I can split up the strings using "/" as a delineator but in terms of
the optimal data structure for storage, I'm not sure if a tree would be
ideal or something else.. I'm sure there's been some sort of research
done on this problem and probably a name for it.

Anyone have any suggestions or ideas?

Thanks,
Dagny

.