Re: traversing (and accessing values in) a hash of hashes

From: Lawrence Statton (lawrence_at_cluon.com)
Date: 12/12/04


To: James marks <jamarks@jamarks.com>
Date: Sat, 11 Dec 2004 17:24:25 -0800


> In the script, there are two variables produced from a regexp match
> against the current line in the access_log:
>
> $new_ip
> $user_agent
>
> and one that is determined by a subroutine that looks for clues that
> the user_agent is a robot:
>
> $user_type
>
> This is the subroutine that builds the hash of hashes. It compares the
> $new_ip in the current line of the access_log with IPs already in the
> hash and updates the user_type/user_agent info if it finds a match and
> creates a new entry if it doesn't.
>
> sub build_user_hash {
> foreach $existing_ip (%user_hash) {
> if ($existing_ip eq $new_ip ) {
> $user_hash {$existing_ip} {user_type} = $user_type;
> $user_hash {$existing_ip} {user_agent} = $user_agent;
> return;
> }
> }
> $user_hash {$new_ip} {user_type} = $user_type;
> $user_hash {$new_ip} {user_agent} = $user_agent;
> $unique_ip_count++;
> }
>

Okay -- now that I have some code to look at, I can give you better advice:

1) use strict;

2) REALLY ... We Mean It. ***USE STRICT***

3) global variables are for 12 year old VB programmers. Avoid them
    unless absolutely necessary.

Here's a slightly cleaned up version of your program, with some data
to test with ...

N.B. That you don't want to iterate across the entire %user_hash in
the foreach -- you want to run once for each key, not once for the key
and again for the value ... It turns out, that in a scalar context the
value of each of these keys is a string of the form HASH(0x11fa80).
This string is unlikely to colllide with your actual IP address, and
therefore won't harm your script, but it's Bad Style.

.................................. BEGIN PERL PROGRAM ..................
#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

our %user_hash;
our $unique_ip_count;

sub build_user_hash {
    my ($new_ip, $user_agent, $user_type) = @_ ;
    foreach my $existing_ip ( keys %user_hash) {
        if ($existing_ip eq $new_ip ) {
            $user_hash {$existing_ip} {user_type} = $user_type;
            $user_hash {$existing_ip} {user_agent} = $user_agent;
            return;
        }
    }
    $user_hash {$new_ip} {user_type} = $user_type;
    $user_hash {$new_ip} {user_agent} = $user_agent;
    $unique_ip_count++;
}

# populate the hash with some fake data

while (<DATA>) {
    if (my ($ip, $agent, $type ) = m/^(.*) - (.*) - (.*)$/) {
        build_user_hash($ip, $agent, $type) ;
    }
}

print Dumper \%user_hash;

print "Here come da judge ... \n";

foreach my $key ( keys %user_hash ) {
    print "$key \"$user_hash{$key}{user_agent}\"\n"
        if $user_hash{$key}{user_type} eq 'robot';
}

__DATA__
192.168.1.1 - ...TOPS-20...EMACS... - human
192.168.1.1 - ...Linux...Opera... - human
192.168.1.3 - ...Yahoo!Slurp... - robot
192.168.1.5 - ...Windows...MSIE... - human
192.168.1.7 - ...Solaris...Mozilla... - human
192.168.1.5 - ...GNU...LWP/0.1... - robot

................................... END PERL PROGRAM ...................

Producing the output:

$VAR1 = {
          '192.168.1.1' => {
                             'user_type' => 'human',
                             'user_agent' => '...Linux...Opera...'
                           },
          '192.168.1.3' => {
                             'user_type' => 'robot',
                             'user_agent' => '...Yahoo!Slurp...'
                           },
          '192.168.1.5' => {
                             'user_type' => 'robot',
                             'user_agent' => '...GNU...LWP/0.1...'
                           },
          '192.168.1.7' => {
                             'user_type' => 'human',
                             'user_agent' => '...Solaris...Mozilla...'
                           }
        };
Here come da judge ...
192.168.1.3 "...Yahoo!Slurp..."
192.168.1.5 "...GNU...LWP/0.1..."

........................................................................

So -- this program does in fact work, but may not do what you want.

For example: Is using the IP address as the key really a good idea?
Might an IP address be assigned to two different users -- or might a
person use two different user agents on one host? Do you want to deal
with that? Do you want the per-host to be a LIST of browser/flavor
pairs?

Consider the following structure:

our %users = ( '192.168.1.1' => [ { user_type => 'human',
                                    user_agent => '...Browser...' },
                                  { user_type => 'robot',
                                    user_agent => '...Some Robot...' } ],
               '192.168.1.2' => [ { user_type => 'robot',
                                    user_agent => '...Another Robot...' } ],
               '192.168.1.3' => [ { user_type => 'human',
                                    user_agent => '...Browser X...' },
                                  { user_type => 'human',
                                    user_agent => '...Browser Y...' } ] );

How do you think you'd create that and iterate across it...?

Your homework: Starting with that structure above, write a subroutine
that will iterate across it producing the following output:

Hint - it will definitely be easier to explicitly break the structure
into lists and hashes at first. Then squeeze your code to eliminate
the intermediate variables ...

foreach my $key (keys %user_hash ) {
    my @list = @{$user_hash{$key}};
    .
    .
    .
}

lawrence /tmp > perl /tmp/ziz.pl
A human at 192.168.1.1 used ...Browser...
A robot at 192.168.1.1 used ...Some Robot...
A human at 192.168.1.3 used ...Browser X...
A human at 192.168.1.3 used ...Browser Y...
A robot at 192.168.1.2 used ...Another Robot...

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
        Lawrence Statton - lawrenabae@abaluon.abaom s/aba/c/g
Computer software consists of only two components: ones and
zeros, in roughly equal proportions. All that is required is to
sort them into the correct order.



Relevant Pages

  • Re: Word 2004 VBA -> Applescript
    ... That sounds like a reason in itself for trying out AppleScript Studio. ... the main script file and call out to "action" handlers in another script ... subroutines, from tiny to small (my find & replace subroutine is the largest ... The truth is, despite the difficulty of working with VBA on the Mac, I ...
    (microsoft.public.mac.office.word)
  • Re: Javascript Confirm()
    ... need to display a confirmation alert to user. ... Below is the script I'm using in the Submit_Clicksubroutine. ... Dim strScript As String = "<script ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: MsgBox Timeout
    ... "S Moran" wrote: ... actually a subroutine even though I haven't defined it as a subroutine? ... So what I am doing is writing a login script for our company and part ... strMsg & VbCrLf &_ ...
    (microsoft.public.scripting.vbscript)
  • Re: The system cannot find the batch label specified
    ... "The system cannot find the batch label specified - RunReports" ... I've written quite an extensive shell script that uses multiple "call ... call:Log SUBSTART STARTING:RunReports SUBROUTINE ...
    (microsoft.public.win2000.cmdprompt.admin)
  • Re: Convert CSV file to multiple worksheets
    ... as used by the Microsoft script to iterate the MOF products list of software installed on each computer. ... The For loop works whether there is 0, ... The DisplayName retrieved by the MS script _is_ the software title. ... Find some way of adding software1 - softwareWhatever to the output line at the top of the script that lists the column headers. ...
    (microsoft.public.scripting.vbscript)