Data structure behind Google Suggest



Can anyone think of an efficient data structure that could handle Google Suggest?

Google Suggest, if you haven't seen it, shows the top 10 searches that relate to the letters that you type into the search box. It operates as you type, so if you type "a", it will give you the top 10 searches that start with "a", "ab", the top ten that start with that, etc.

It goes down to an amazing level of detail, and shows even extremely obscure searches after you've typed two dozen characters. There has got to be a billion or more search phrases in their database.

How could you do this using a reasonable amount of hardware? Keep in mind that you send results *after every character typed*, so it has to be extremely fast.

The best I can come up with is a distributed btree. Perhaps you hash on the first few chars, use that to pick a computer to serve all queries that pertain to those letters, and then have a btree with a record for every query that starts with those letters. The record contains the top 10 searches. You do a lookup and return the 10.

The trouble with this scheme is that it would take a vast amount of disk space. A lot more than it would take to store a billion queries, primarily because there's a lot of duplication. Plus, you'd have to store everything in memory to get any kind of speed or throughput.

Anyone know of a better way?
.



Relevant Pages

  • Re: Nominations (4) -> Re: Several ways you can delete a large avi file, and other files that mig
    ... >> It doesn't seem to be sinking in, Damian. ... which is not rather odd becuase he posts out of google. ... ventilation; avoid extreme temperatures and store in a cool, ... away from open flames, naked flames and old flames; avoid inhaling fumes; ...
    (alt.computer.security)
  • Re: Automating Searches
    ... its for a once off educational project. ... There is a set number of queries, ... Google is being "ripped off" iff you do something like: ... Firefox with manually-typed queries ...
    (comp.lang.java.programmer)
  • Re: Automating Searches
    ... There is a set number of queries, ... Google is being "ripped off" iff you do something like: ... Then we have Firefox with a MRU for queries; ... using the search engine proper, though the SEO will impact the latter ...
    (comp.lang.java.programmer)
  • Re: Attn Tech Zero -- writely.com UPDATE!!!
    ... "bakedlite" appeared in history: ... And I heard about Writely's sale to google on a podcast I downloaded ... leave it that is how you make money online. ... you have to store files somewhere if you want to ...
    (news.software.readers)
  • Re: Attn Tech Zero -- writely.com UPDATE!!!
    ... I saw a headline that Google is planning an on-line ... And we know how Google would LOVE to store every ... you're working on on thier servers. ... user thier server space when sharing the doc between ...
    (news.software.readers)