Re: Algorithm help for unique string searching/counting within an array.



On 2006-05-30 13:35:32 -0300, Paul Van Delst <Paul.vanDelst@xxxxxxxx> said:

*** Hendrickson wrote:
Can't you just sort the array and then look for
A(I) /= A(I+1) ?

Or am I missing something obvious?

No, I don't think so. I was trying to avoid sorting the array as the order of the various elements is important for other reasons. However, for simply counting the sensors, I think sorting is the simpler approach.

(To be honest, I was wondering if some of the codesmiths out there would come up with a one liner using an f95 intrinsic in some neato, unthought-of-by-me, new way. :o)

cheers,

paulv

This is a case where you would sort the pointers to (i.e. subscripts of) the array
rather than the array. It is often called a "formal sort". Then you can either
access the array in its old/natural order or in its sorted order. The sorted
order makes finding duplicates trivial as noted.

Formal sorts get used when the objects to be sorted are big, vary in size, may
only be virtual, etc, etc.




*** Hendrickson

Paul Van Delst wrote:

Hello,

I want to be able to search an array of strings (with many repeated elements) so I can count the unique elements. I've attached a working program at the bottom of the post so you can see what I've got so far.

What I've go is a list of strings for different satellite sensors and I want to count how many unique sensors there are in the list. I do this by determining how many sensors I *haven't* counted (via a logical array) and their index positions. For these uncounted sensors, I select the first one and determine the index positions of all matches, and then remove them from subsequent counts by setting the logical array. I increment the sensor counter and then do it all over again.

The result I came up with sure looks ugly (lots of housekeeping arrays and counters) - and I think for a larger array of strings it will be pretty inefficient due to the repeated usage of COUNT and PACK. Hence my request for help. Does anyone have a smarter method of counting the occurrances of unique strings within a larger array?

cheers,

paulv


-----------------------
program testcount
integer,parameter::N=56
character(16),dimension(N)::sensorid
integer,dimension(N)::lindgen
logical,dimension(N)::uncounted
integer,dimension(:),allocatable::idx_uncounted,idx_matched
integer::i,n_uncounted,n_matched,n_sensors
character(16)::sensorid2match

sensorid=(/&
'abi_gr ','abi_gr ','abi_gr ','abi_gr ',&
'amsre_aqua ','amsre_aqua ','amsre_aqua ','amsre_aqua ',&
'amsua_metop-a ','amsua_metop-a ','amsua_metop-a ','amsua_metop-b ',&
'amsua_metop-b ','amsua_metop-c ','amsua_metop-c ','amsua_metop-c ',&
'amsua_metop-c ','amsua_n15 ','amsua_n15 ','amsua_n18 ',&
'amsua_n15 ','amsua_n15 ','amsua_n16 ','amsua_n16 ',&
'amsua_n17 ','amsua_n18 ','amsua_n18 ','amsua_n18 ',&
'amsua_n18 ','amsua_n18 ','amsua_n16 ','amsua_n18 ',&
'hirs3_n15 ','hirs3_n15 ','hirs3_n16 ','hirs3_n16 ',&
'hirs3_n16 ','hirs3_n17 ','hirs3_n17 ','hirs3_n18 ',&
'hirs3_n18 ','amsub_n16 ','amsub_n16 ','amsub_n16 ',&
'imgr_g13 ','imgr_g13 ','imgr_g13 ','imgr_g13 ',&
'mhs_metop-a ','mhs_metop-a ','modis_terra ','modis_terra ',&
'ssmi_f14 ','windsat_coriolis','windsat_coriolis','windsat_coriolis'/)

! The sensorID index array
lindgen=(/(i,i=1,N)/)

! Specify all as uncounted
uncounted=.true.

! Initialise sensor counter
n_sensors=0

do
! Find the indices of the uncounted sensors
n_uncounted=count(uncounted)
print *, 'Number of uncounted elements: ', n_uncounted, ' out of ', N
if(n_uncounted==0) exit
allocate(idx_uncounted(n_uncounted))
idx_uncounted=pack(lindgen,uncounted)

! Select the first uncounted sensor for this loop
sensorid2match=sensorid(idx_uncounted(1))
print *, ' Searching for ',sensorid2match

! Amongst the uncounted sensors, find the selected sensor
n_matched=count(sensorid(idx_uncounted)==sensorid2match)
print *, ' Number of ', sensorid2match, ' entries ', n_matched
allocate(idx_matched(n_matched))
idx_matched=pack(lindgen(idx_uncounted),&
sensorid(idx_uncounted)==sensorid2match)
print *, idx_matched
print *, sensorid(idx_matched)

! Mark the current sensor as counted
uncounted(idx_matched) = .false.

! Increment sensor counter
n_sensors=n_sensors+1
print *, 'n_sensors = ', n_sensors

deallocate(idx_matched,idx_uncounted)
read(*,*) ! Pause for a looksee
end do

end program testcount


.


Quantcast