Large-scale full-text search: how to get the intersection list fast?



The tale says full-text search engines precompute a list of pages which
contain a certain keyword. So, it's very easy to return a list of
results if only one keyword is submitted.

However, if more than one keyword are submitted for results containing
all of these keywords, how do you quickly get the intersection list of
several very long page-lists each led by one keyword?

In Google:
cat: 423,000,000
dog: 334,000,000
cat dog: 56,200,000

Information theory says the intersection time is linear to the length
of the shorter of the two lists to intersect.

My guess is that there is a hash array PagesForTwoWords to store
page-lists which contain a certain word pair. For example, given a
search query [cat dog], a hash value H is generated based on the string
"cat*dog". Then we go to PagesForTwoWords[H] for a pointer to a list of
pages which contain both "cat" and "dog".

This guess is based on the claim that "time is precious but storage is
cheap". Does anyone know the real or a better approach?

Regards,
Yao Ziyuan

.



Relevant Pages

  • chapter4
    ... The for statement in Python differs a bit from what you may be used to ... list or a string), in the order that they appear in the sequence. ... (this can only happen for mutable sequence types, such as lists). ... The keyword def introduces a function definition. ...
    (Ubuntu)
  • Re: Keywords and CL-WHO
    ... All the segments and elements are broken into lists of ... of writing the code needed to transform a parsed XML tree directly ... It appears that you are confusing macroexpansion time [which is ... begins with a keyword is not legal CL source code, ...
    (comp.lang.lisp)
  • Re: Lisp for the C21
    ... Every functional language I remember has lists as matter of course, ... Common Lisp has optional and keyword arguments. ... to let the compiler use lists in these cases. ...
    (comp.lang.functional)
  • Re: Lisp for the C21
    ... and the need for variadic functions goes ... Every functional language I remember has lists as matter of course, ... Common Lisp has optional and keyword arguments. ... (defun foo (bar &optional baz) ...
    (comp.lang.functional)
  • Parser or regex ?
    ... The trouble is that I have to pull out the separate arguments, ... pull apart the keyword arguments and the list keyword arguments. ... I can use ``_paramstring`` with findall to pull out all the arguments. ... For keyword arguments and lists constructors I use another regular ...
    (comp.lang.python)