Re: Boolean Query Algorithm



Hashes are no good idea in this context. If you look for whole words only, you want to use "Patricia Tries". If you want to find also partial strings, look at Suffix Trees in general.

Store positions and lengths with the nodes and/or leaves, and when searching for combinations, check if P1+L1 is close enough to P2, allowing for delimiters like whitespace etc.

As for Google: They developed an own Google File System, it's pretty well documented somewhere on the net. The search algorithms themselves are, as far as I know, business secret.

Regards
//Herbert

--
http://herbert.wikispaces.com
.