Re: Good search theory
nospam_at_geniegate.com
Date: 03/16/05
- Next message: Eclectic: "multi sorting multi dimensional array?"
- Previous message: nospam_at_geniegate.com: "Re: Object Oriented Content System - the idea"
- In reply to: AaronV: "Good search theory"
- Next in thread: AaronV: "Re: Good search theory"
- Reply: AaronV: "Re: Good search theory"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Wed, 16 Mar 2005 05:54:49 GMT
In: <1110915682.132517.134550@l41g2000cwc.googlegroups.com>, "AaronV" <aaron.vanderpoel@gmail.com> wrote:
>Hello,
>
>I'm a webmaster for a college newspaper and I'm implementing an article
>search. I'm running PHP with a MySQL database to store the weekly
>stories. Does anyone know of an article that could offer good search
>theory.
If it's an option for you, have a look at swish-e
I don't know if there is a PHP interface or not though. It's semi-difficult to
set up, but the folks who wrote it really did a good job. There are all kinds
of ways of setting up Swish-e for META tags and the like.
Proximity and phrases are quite difficult, tricky stuff but swish-e handles
them.
If swish-e won't work another option might be Lucene:
http://lucene.apache.org/java/docs/
Been a few years, but when I checked into it Lucene was quite good as well.
It's java, which may be an issue if you're not already running servlets.
Surprisingly fast, especially considering it's java.
Another option is Ht://dig
Last I checked, it didn't do phrase matching, but it's quite mature. Been
around a long time, several people are using it. It's the easiest one I've
seen where setup is concerned. If you don't require phrase match, it's pretty
decent.
All of them that I've listed use an index and are pretty good at scale.
Wouldn't try to use them in place of teoma.com, (With the possible exception of
multiple Lucene's) but I bet they would work well for your application.
One could probably fill a small library (or at least a full section of a
library) with books on the subject of searching full text. 'tis not an easy
task.
>Seems like there are a lot of choices in how to set up a good search
>system and I'd like to get started on the right foot to reduce my work
>load.
Maybe I'm prejudiced, but in my opinion SQL databases are not really designed
for searching full text. (Been awhile, but I've been burned by them for
fulltext search in the past) I suppose for a few hundred articles and/or
highly custom search tools, an SQL database would work. (If your articles are
in XML, then such a database would be OK for searching in titles or maybe within
pre-determined XML containers like <var>..</var>)
The "issue" I take with them is that you are effectively using a database
AS an index. A database's primary goal is (or should be) data storage. Fulltext
indices are a different beast altogether.
They are excellent for setting up prototype "proof of concept" but quickly
break down when using them for larger quantities of data. (This opinion based
on a context-aware search tool, done in 1999, 6 years is a long time and things
may have changed.)
They do make good URL storage devices, last index time, things like that.
Jamie
-- http://www.geniegate.com Custom web programming guhzo_42@lnubb.pbz (rot13) User Management Solutions
- Next message: Eclectic: "multi sorting multi dimensional array?"
- Previous message: nospam_at_geniegate.com: "Re: Object Oriented Content System - the idea"
- In reply to: AaronV: "Good search theory"
- Next in thread: AaronV: "Re: Good search theory"
- Reply: AaronV: "Re: Good search theory"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|