Spell checking algorithms
From: Leo P. (mathgodleo_at_yahoo.com)
Date: 06/29/04
- Next message: Raymond DeCampo: "Re: Regex on whole (large) text file"
- Previous message: WEC: "Re: Nube question - Curent SDK"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 29 Jun 2004 14:00:30 -0700
I am trying to write a spelling checker. Specifically the part that
suggests words. From what I've read online, it seems like the
preferred way to do this, is to find the double metaphone equivalent
of a mispelled word, then to compute the mispelled word's distance
from other words which have the same, or a similar metaphone
equivalent.
I've run into some problems with this approach. For example,
"accommodation" has a metaphone code of AKMT, while "acocmmodation"
has a metaphone code of AKKM. These are different enough that my
implementation doesn't even try to see if the two words are close
together, although it's clear that they are.
Frustrated by this, I tried to compute the distance (Levenshtein)
between "acocmmodation" and every other word in my dictionary file.
This was suprisingly fast, and the correct suggestion appeared. This
makes me wonder, what is the point of doing all of the double
metaphone stuff, as opposed to just a brute force search? Why do all
of the spell checking algorithms first convert a mispelled word into a
phoenetic representation before comparing it to other words? Is it
because the speed used to be an issue or slower machines, or am I
missing something? For reference, when I say that the brute force
search was 'surprisingly fast', it took about 75 milliseconds to
compute the levenshtein distance from "acocmmodation" to every word in
a dictionary of 115K words.
Leo
- Next message: Raymond DeCampo: "Re: Regex on whole (large) text file"
- Previous message: WEC: "Re: Nube question - Curent SDK"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]