Data structure behind Google Suggest
- From: Chris <spam_me_not@xxxxxxxxxx>
- Date: Mon, 24 Sep 2007 21:38:08 -0500
Can anyone think of an efficient data structure that could handle Google Suggest?
Google Suggest, if you haven't seen it, shows the top 10 searches that relate to the letters that you type into the search box. It operates as you type, so if you type "a", it will give you the top 10 searches that start with "a", "ab", the top ten that start with that, etc.
It goes down to an amazing level of detail, and shows even extremely obscure searches after you've typed two dozen characters. There has got to be a billion or more search phrases in their database.
How could you do this using a reasonable amount of hardware? Keep in mind that you send results *after every character typed*, so it has to be extremely fast.
The best I can come up with is a distributed btree. Perhaps you hash on the first few chars, use that to pick a computer to serve all queries that pertain to those letters, and then have a btree with a record for every query that starts with those letters. The record contains the top 10 searches. You do a lookup and return the 10.
The trouble with this scheme is that it would take a vast amount of disk space. A lot more than it would take to store a billion queries, primarily because there's a lot of duplication. Plus, you'd have to store everything in memory to get any kind of speed or throughput.
Anyone know of a better way?
.
- Follow-Ups:
- Re: Data structure behind Google Suggest
- From: Chris F Clark
- Re: Data structure behind Google Suggest
- From: Paul E. Black
- Re: Data structure behind Google Suggest
- Prev by Date: Modelling mobility with Petri nets
- Next by Date: Generalized Quine-McCluskey
- Previous by thread: Modelling mobility with Petri nets
- Next by thread: Re: Data structure behind Google Suggest
- Index(es):
Relevant Pages
|