Re: Algorithm help for unique string searching/counting within an array.
- From: glen herrmannsfeldt <gah@xxxxxxxxxxxxxxxx>
- Date: Wed, 31 May 2006 00:32:06 -0700
Paul Van Delst wrote:
I want to be able to search an array of strings (with many repeated elements) so I can count the unique elements. I've attached a working program at the bottom of the post so you can see what I've got so far.
If you expect duplicates a hash table is probably easier and faster.
You might find a hash routine with a built in hash function, otherwise
my favorite is the CRC32 polynomial, which is easy to implement using
a 256 element table and some shift and exclusive or functions. After computing the CRC32 value for each string, take that value modulo the hash table size. Easiest is a linked list at each hash position, but
there are other possibilities. Maybe you can find an already written hashing routine.
As others have said, there are languages which make this very easy,
such as perl and awk. Many C libraries have a hash routine, as does
Java. From reading a file with a list to writing the result is
about two lines of AWK, 20 of C, or 30 of Java. (For the latter two, most of the work is reading the input file and writing the result.)
Sorting should be O(N log N), even for an external sort. Hashing is usually considered O(N), though that depends on N not being too large.
I have done hash table counting for millions of unique entries (out of billions). You really want the hash table to fit in memory.
If it can't, then an external sort might be better.
-- glen
.
- References:
- Algorithm help for unique string searching/counting within an array.
- From: Paul Van Delst
- Algorithm help for unique string searching/counting within an array.
- Prev by Date: Re: Algorithm help for unique string searching/counting within an array.
- Next by Date: Re: allocatable array *not* equivalent to dynamic allocation?
- Previous by thread: Re: Algorithm help for unique string searching/counting within an array.
- Next by thread: Re: Algorithm help for unique string searching/counting within an array.
- Index(es):
Relevant Pages
|