Re: Do java support 'String' methods properly ?
- From: "Ingo R. Homann" <ihomann_spam@xxxxxx>
- Date: Tue, 08 Nov 2005 13:27:29 +0100
Hi,
Googmeister wrote:
This is how Java implements indexOf() as well. It is efficient (linear) for "typical" inputs, but has a quadratic worst case.
Depends on what is 'typical' for you. :-)
(BTW, linear search is not really quadratic, but M*N.)
Google for Knuth-Morris-Pratt and Boyer-Moore-Horspool for *efficient* String searching algorithms!
Anyone know why Java's indexOf doesn't use Boyer-Moore (sublinear on typical inputs) for long strings? I suspect it's a combination of requiring extra space proportional to the alphabet size and difficulties in dealing with the subtleties of Unicode (e.g., code points). But it would be nice to hear from the experts.
IIRC, all algorithms (linear, BMH, KMP) have different advantages, depending on the sizes of the text and the pattern, the size of the alphabet, the distribution of the characters and if you are searching a pattern in different texts (do not forget that you need to generate a "suffix-list" for the pattern)!
If the both the text and the pattern are very short (perhaps only 10 to 20 chars?), it is obvious that linear search ist the best. If the alphabet is very short (eg only A and B), the pattern might be AAAAA, so that KMP (*) has no advantage over linear search as well.
Fact is: You simply cannot decide which algorithm is 'best' if you do not know *anything* about the application (which of course sun can't).
And if you want to implement a large text-database or a gene-sequencer, it is obvious that you have to think *a lot* about searching-algorithms, so that *no* implementation (that has not been optimized especially for the exact application (**)) is perfect.
Ciao, Ingo
(*) It has been nearly 10 years since I implemented these algorithms, and I remember there were some slight differences between BM and BMH wich I do not remember right now. So, forgive me, if I confuse the algorithms...
(**) In these cases, it might often be useful if you do not have the whole text in RAM, but e.g. to have a linked list of "chunks" of texts, with one chunk perhaps of the size of some MB. The rest remains on disk. No default-implementation can deal with that!
.
- Follow-Ups:
- Re: Do java support 'String' methods properly ?
- From: Googmeister
- Re: Do java support 'String' methods properly ?
- References:
- Re: Do java support 'String' methods properly ?
- From: Ingo R. Homann
- Re: Do java support 'String' methods properly ?
- From: Googmeister
- Re: Do java support 'String' methods properly ?
- Prev by Date: Re: javax.usb
- Next by Date: Re: javax.usb
- Previous by thread: Re: Do java support 'String' methods properly ?
- Next by thread: Re: Do java support 'String' methods properly ?
- Index(es):
Relevant Pages
|