Re: Accomodate for poor db design using Sort::Maker?



John W. Krahn wrote:
DJ Stunks wrote:

Hi all, topical question here regarding Sort::Maker

I have a poorly designed db table over which I have no control. The
data isn't normalized but for each row I'd like to return the column
numbers which have the highest number of counts, and the highest column
number with any counts at all. I refer to these values as mode and max
respectively, but please refer to my code and it's output below for a
better understanding. (I have provided sample input data and mimicked
the fetchrow_hashref function provided by DBI)

Both sorts seem klunky, and after the discussion about the efficiency
gains given by Sort::Maker in another thread I'd like to take advantage
if I can (the sorts must be performed for each row returned by the
database).

I'm not sure how to pass in the information in the $hashref to
make_sorter. Advice would be greatly appreciated.

TIA,
-jp

Code:
#!/usr/bin/perl

use strict;
use warnings;

print "id\tmode\tmax\n";
while ( my $hashref = fetchrow_hashref() ) {
my $id = $hashref->{id};

# mode = col number with greatest count
my $mode = ( map { $_->[0] }
sort { $b->[1] <=> $a->[1] }
map { [ $_, $hashref->{"col$_"} ] } 1..5 )[0];

# max = highest col number with any count
my $max = ( sort { $b <=> $a }
map { $_ if $hashref->{"col$_"} > 0 } 1..5 )[0];

$max = 1 if $max eq ''; #all zeros



If you want efficiency then you should just use loops instead of sorting:


my ( $mode, $curr ) = ( 1, 0 );
for ( 1 .. 5 ) {
if ( $hashref->{ "col$_" } > $curr ) {
$curr = $hashref->{ "col$_" };
$mode = $_;
}
}

my $max = 1;
for ( reverse 1 .. 5 ) {
if ( $hashref->{ "col$_" } > 0 ) {
$max = $_;
last;
}
}



John

Have you looked at List::Util? It's a standard module in Perl 5.8, and contains a max function. So the first loop becomes something like

use List::Util qw{max};

my $mode = max (map {$hashref->{"col$_"}} 1..5);

Assuming the only zeros are trailing, the second loop could be

my $max = grep {$_ > 0} map {$hashref->{"col$_"}} 1..5

since the value of a list in scalar context is the number of elements in the list. If you have zeros embedded, this won't give the right answer, since what it really does is find the number of values > 0, not the highest index with a value > 0.

Of course, for either of these you might want to make the hashref into a real list, to avoid mapping twice.

Tom Wyant
.