Re: Algorithm help for unique string searching/counting within an array.



Thomas Koenig wrote:
Paul Van Delst <Paul.vanDelst@xxxxxxxx> wrote:

I want to be able to search an array of strings (with many repeated elements) so I can count the unique elements.

Are you sure you want to use Fortran for this?

Yes.

Languages like Perl or
awk have built-in arrays with strings as subscripts. Perl calls
them "hashes".

As do many other languages. Or "directories".

Here's a short Perl program (for illustration), which assumes that
all the strings are in separate lines.

#! /usr/bin/perl

while(<>) { # Loop over all entries in the array, putting values into $_ .
chomp; # Eat newlines
$string{$_}++; # Increment the hash element with the subscript $_ by
# one, creating the element if necessary.
}

print scalar(keys(%string)),"\n"; # Count the number of keys to the hash.

I can also do the above in Ruby in one line. Doesn't help me in a Fortran executable though. :o) To be fair to folks who think doing this in Fortran is not entirely appropriate, this capability of sensor counting is an existing function (say, method) associated with a particular derived type (object) so there's a Fortran-centric history in my requirements. We're moving from representing sensors with an arbitrary integer to a people-readable string. My request to this group was due to me not being very smart when it comes to working with strings.

Hash tables for my original little application are a bit of overkill I reckon. I figure doing a sort and then a count using CSHIFT, e.g.:

! Define some sensor strings
cArray = (/ 'abi_gr ','imgr_g13 ','windsat_coriolis',&
'amsua_metop-c ','amsub_n15 ','amsua_n18 ',&
'amsua_n18 ','hirs3_n17 ','hirs3_n15 ',&
'modis_terra ','mhs_metop-a ','amsua_metop-b ',&
'amsua_metop-c ','amsub_n15 ','amsua_n18 ',&
'amsua_n18 ','hirs3_n17 ','hirs3_n15 ',&
'modis_terra ','mhs_metop-a ','amsua_metop-b ',&
'abi_gr ','imgr_g13 ','windsat_coriolis'/)

! Sort them
CALL InsertionSort( cArray )

! Count the unique entries
nSensors = MAX(COUNT(cArray /= CSHIFT(cArray,1)),1)

This seems to work pretty well. I'm also implementing a version something like so:

! Sort them
CALL InsertionSort( cArray, Idx ) ! Return a sorted index

! Count the unique entries
nSensors = MAX(COUNT(cArray(Idx) /= CSHIFT(cArray(Idx),1)),1)

for those cases where I don't want to change the order of the original array.

As *** Hendrickson pointed out in his post, once the array is sorted, the uniqueness determination is pretty trivial. I find the above pretty easy to understand. It sure is a *lot* tidier than the dog's dinner of a program I originally posted! :o)

cheers,

paulv

--
Paul van Delst Ride lots.
CIMSS @ NOAA/NCEP/EMC Eddy Merckx
Ph: (301)763-8000 x7748
Fax:(301)763-8545
.


Quantcast