Re: fast dictionary search algorithm
From: Gerry Quinn (gerryq_at_indigo.ie)
Date: 02/09/04
- Next message: Corey Murtagh: "Re: turing completeness"
- Previous message: Willem: "Re: Optimization"
- In reply to: xbunny: "Re: fast dictionary search algorithm"
- Next in thread: Gerry Quinn: "Re: fast dictionary search algorithm"
- Reply: Gerry Quinn: "Re: fast dictionary search algorithm"
- Reply: Willem: "Re: fast dictionary search algorithm"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Mon, 09 Feb 2004 12:20:26 GMT
In article <l5JVb.23739$P32.9918@news-binary.blueyonder.co.uk>, xbunny <xbunny@eidosnet.co.uk> wrote:
>Prashanth Ellina wrote:
>
>> Hi,
>> I have a text file with words and meanings. The record count is
>> approximately 100,000. I need to develop an algorithm that will find a
>> requested word's meaning in the shortest time possible. The algorithm
>> may be a combination of existing search algorithms or an
>> improvisation.
>>
>> I have considered hashing and multiple level hashing as an option
>> but am not sure about the hash function. People tell me it is
>> difficult to avoid collisions. Have tried binary search, however need
>> something faster.
>>
>> any ideas?
>
>Im no expert but I would have thought a binary search for a hash from
>the source word would have been amply fast enough for a 100,000 word
>dictionary. So long as your compare function is simple then it should
>fly. I would try dealing with collisions the simplest way ie (chained
>buckets in the hash table) and see its good enough but you can always
>try double hashing if you like or something more estoric I suppose.
>Theres are lots of string hashing functions on the net, quick google
>reveals some really fancy ones!
If the words come as ASCII chars and have 26 letters plus hyphen and
null character, I suppose one might in principle use a cunning sequence
of arrays based on converting one or more letters into an integer, then
using it as an array offset. The offset would contain a code which
represents either a pointer to another array which takes an offset based
on the next few letters, or to a found meaning, or to one or more more
compact search methods that are used when multiplying further arrays
would use up too much memory.
Say we have "vertiginous":
Make an integer offset from "ver"
Now there are maybe 500 words (guessing)
Code says take the next three letters and offset from a second array
that has entries for "teb" and "tig" among others
Calling second array with "tig" offset
Only five words start with "vertig", code points to an array of strings
in which a binary search is conducted by text comparison on the
remainder of the string ("inous\0").
That might be faster than a hash and 17 integer comparisons, followed by
collision testing.
Of course the program that sets up the tables would have to be quite
sophisticated in order to optimise memory use.
Gerry Quinn
-- http://bindweed.com Screensavers, Games, Kaleidoscopes Download free trial versions
- Next message: Corey Murtagh: "Re: turing completeness"
- Previous message: Willem: "Re: Optimization"
- In reply to: xbunny: "Re: fast dictionary search algorithm"
- Next in thread: Gerry Quinn: "Re: fast dictionary search algorithm"
- Reply: Gerry Quinn: "Re: fast dictionary search algorithm"
- Reply: Willem: "Re: fast dictionary search algorithm"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|