Re: Algorithm help for unique string searching/counting within an array.





Paul Van Delst wrote:
*** Hendrickson wrote:

Can't you just sort the array and then look for
A(I) /= A(I+1) ?

Or am I missing something obvious?


No, I don't think so. I was trying to avoid sorting the array as the order of the various elements is important for other reasons. However, for simply counting the sensors, I think sorting is the simpler approach.

(To be honest, I was wondering if some of the codesmiths out there would come up with a one liner using an f95 intrinsic in some neato, unthought-of-by-me, new way. :o)

Oh sorry, I totally misunderstood the question ;).

I've got to run now, so this is off the top of my head and
completely untested.

forall(i=1:n) numbers(i) =
merge(0, count(a(i) == A(I:N)), any(a(i) == a(1:I-1))

I think that puts the number of matching sensors in the
first element of numbers for each unique sensor id. I
think it puts a 0 in numbers for the second, and all
subsequent occurrances. I'm not too sure that it will
work for the last element. But this ought to be a darn
good start at cleaning up the code and making it easy to
understand, or at least shorter ;).

*** Hendrickson

cheers,

paulv



*** Hendrickson

Paul Van Delst wrote:

Hello,

I want to be able to search an array of strings (with many repeated elements) so I can count the unique elements. I've attached a working program at the bottom of the post so you can see what I've got so far.

What I've go is a list of strings for different satellite sensors and I want to count how many unique sensors there are in the list. I do this by determining how many sensors I *haven't* counted (via a logical array) and their index positions. For these uncounted sensors, I select the first one and determine the index positions of all matches, and then remove them from subsequent counts by setting the logical array. I increment the sensor counter and then do it all over again.

The result I came up with sure looks ugly (lots of housekeeping arrays and counters) - and I think for a larger array of strings it will be pretty inefficient due to the repeated usage of COUNT and PACK. Hence my request for help. Does anyone have a smarter method of counting the occurrances of unique strings within a larger array?

cheers,

paulv


-----------------------
program testcount
integer,parameter::N=56
character(16),dimension(N)::sensorid
integer,dimension(N)::lindgen
logical,dimension(N)::uncounted
integer,dimension(:),allocatable::idx_uncounted,idx_matched
integer::i,n_uncounted,n_matched,n_sensors
character(16)::sensorid2match

sensorid=(/&
'abi_gr ','abi_gr ','abi_gr ','abi_gr ',&
'amsre_aqua ','amsre_aqua ','amsre_aqua ','amsre_aqua ',&
'amsua_metop-a ','amsua_metop-a ','amsua_metop-a ','amsua_metop-b ',&
'amsua_metop-b ','amsua_metop-c ','amsua_metop-c ','amsua_metop-c ',&
'amsua_metop-c ','amsua_n15 ','amsua_n15 ','amsua_n18 ',&
'amsua_n15 ','amsua_n15 ','amsua_n16 ','amsua_n16 ',&
'amsua_n17 ','amsua_n18 ','amsua_n18 ','amsua_n18 ',&
'amsua_n18 ','amsua_n18 ','amsua_n16 ','amsua_n18 ',&
'hirs3_n15 ','hirs3_n15 ','hirs3_n16 ','hirs3_n16 ',&
'hirs3_n16 ','hirs3_n17 ','hirs3_n17 ','hirs3_n18 ',&
'hirs3_n18 ','amsub_n16 ','amsub_n16 ','amsub_n16 ',&
'imgr_g13 ','imgr_g13 ','imgr_g13 ','imgr_g13 ',&
'mhs_metop-a ','mhs_metop-a ','modis_terra ','modis_terra ',&
'ssmi_f14 ','windsat_coriolis','windsat_coriolis','windsat_coriolis'/)

! The sensorID index array
lindgen=(/(i,i=1,N)/)

! Specify all as uncounted
uncounted=.true.

! Initialise sensor counter
n_sensors=0

do
! Find the indices of the uncounted sensors
n_uncounted=count(uncounted)
print *, 'Number of uncounted elements: ', n_uncounted, ' out of ', N
if(n_uncounted==0) exit
allocate(idx_uncounted(n_uncounted))
idx_uncounted=pack(lindgen,uncounted)

! Select the first uncounted sensor for this loop
sensorid2match=sensorid(idx_uncounted(1))
print *, ' Searching for ',sensorid2match

! Amongst the uncounted sensors, find the selected sensor
n_matched=count(sensorid(idx_uncounted)==sensorid2match)
print *, ' Number of ', sensorid2match, ' entries ', n_matched
allocate(idx_matched(n_matched))
idx_matched=pack(lindgen(idx_uncounted),&
sensorid(idx_uncounted)==sensorid2match)
print *, idx_matched
print *, sensorid(idx_matched)

! Mark the current sensor as counted
uncounted(idx_matched) = .false.

! Increment sensor counter
n_sensors=n_sensors+1
print *, 'n_sensors = ', n_sensors

deallocate(idx_matched,idx_uncounted)
read(*,*) ! Pause for a looksee
end do

end program testcount








.


Quantcast