Re: why does MATCH/AGAINST fail to catch entries that LIKE does catch?



On Tue, 19 Jun 2007 09:08:18 +0200, lawrence k <lkrubner@xxxxxxxxxxxxx> wrote:


Wierd. Go to this page:

http://www.ihanuman.com/search.php

and search for "yoga"

This query gets run:

SELECT * FROM albums WHERE MATCH(name,description) AGAINST ('yoga')
ORDER BY id DESC

it returns nothing. (other searches work, but not the one for
"yoga").

But if I do SELECT * FROM albums WHERE description LIKE '%yoga%'

then I get 5 matches

This clearly a MySQL issue, NOT PHP.
However, I think this might shed some light, from the manual:

"The MySQL FULLTEXT implementation regards any sequence of true word characters (letters, digits, and underscores) as a word. That sequence may also contain apostrophes (?'?), but not more than one in a row. This means that aaa'bbb is regarded as one word, but aaa''bbb is regarded as two words. Apostrophes at the beginning or the end of a word are stripped by the FULLTEXT parser; 'aaa'bbb' would be parsed as aaa'bbb.

The FULLTEXT parser determines where words start and end by looking for certain delimiter characters; for example, ? ? (space), ?,? (comma), and ?.? (period). If words are not separated by delimiters (as in, for example, Chinese), the FULLTEXT parser cannot determine where a word begins or ends. To be able to add words or other indexed terms in such languages to a FULLTEXT index, you must preprocess them so that they are separated by some arbitrary delimiter such as ?"?.

Some words are ignored in full-text searches:

Any word that is too short is ignored. The default minimum length of words that are found by full-text searches is four characters.

Words in the stopword list are ignored. A stopword is a word such as ?the? or ?some? that is so common that it is considered to have zero semantic value. There is a built-in stopword list, but it can be overwritten by a user-defined list."

"Every correct word in the collection and in the query is weighted according to its significance in the collection or query. Consequently, a word that is present in many documents has a lower weight (and may even have a zero weight), because it has lower semantic value in this particular collection. Conversely, if the word is rare, it receives a higher weight. The weights of the words are combined to compute the relevance of the row.

Such a technique works best with large collections (in fact, it was carefully tuned this way). For very small tables, word distribution does not adequately reflect their semantic value, and this model may sometimes produce bizarre results. For example, although the word ?MySQL? is present in every row of the articles table shown earlier, a search for the word produces no results:

mysql> SELECT * FROM articles
-> WHERE MATCH (title,body) AGAINST ('MySQL');
Empty set (0.00 sec)

The search result is empty because the word ?MySQL? is present in at least 50% of the rows. As such, it is effectively treated as a stopword. For large datasets, this is the most desirable behavior: A natural language query should not return every second row from a 1GB table. For small datasets, it may be less desirable."


So, in short: 'yoga' might not be found as a separate word, or be considered to 'common' to match. For more details, ask a MySQL-group.

--
Rik Wasmus
.



Relevant Pages

  • Re: Problem with SQL and Recordset
    ... This assumes that the name of the subform control on the parent form is the ... NextSuffix field from the "query" to be entered into the Suffix field on ... Dim mySQL As String ...
    (microsoft.public.access.modulesdaovba)
  • Re: [Q]Problems related to the MySQL linked Server.
    ... The error messge when I query data with following SQL. ... select @v_intUserNo = uno ... where userid = 'testid' ... I have two questions about the linked server for the MySQL. ...
    (microsoft.public.sqlserver.odbc)
  • Re: Performance MySQL
    ... schicke erst den Query an die Datenbank ab, ... Möglichkeit von Concurrent Inserts nur MyISAM Storages ein. ... > MySQL bis jetzt noch nichts zu bieten. ... Welcher Tabellentyp? ...
    (de.comp.datenbanken.mysql)
  • Re: WEIGHT confusion (SQL 2k, running on Win2k3)
    ... what I think you really need is a multiple column join using a single ... multiple CONTAINSTABLE query will perform very poorly as each CONTAINSTABLE ... >> is all basic configuration info necessary to troubleshoot SQL FTS ... >> but I need to test it out a bit with a variable weight parameter. ...
    (microsoft.public.sqlserver.fulltext)
  • saved queries based on other saved queries vs. one big query in a forms module using DAO
    ... flawed set of queries I made and now it's finally back to the drawing board. ... I have several saved queries, ... The activities each have a weight within that group and the ... The first time I tried this I made a query that got the summary infomation ...
    (microsoft.public.access.queries)