Re: problem with hash & sort array

From: Jay Tilton (tiltonj_at_erols.com)
Date: 12/31/03


Date: Wed, 31 Dec 2003 04:43:47 GMT


[Please be aware of how your news client handles word-wrapping of long
lines. The quoted text below has been reformatted.]

uNConVeNtiOnAL <tomcat@visi.com> wrote:

: I am trying to read from an file and put the lines into a
: hash. Then I put the hash into an array with the sort
: command. This sort will put the array into such order that
: I can see if duplicate lines occur and add their numeric
: total field together. I will add the combined data as new
: hash entries and then remove the original lines that were
: duplicates.
:
: I don't seem to be putting any values in the hash

What sequence of debugging steps leads you to that conclusion?

: before
: you go off on me, this code is very similar to code that is
: working. The twist is I have to identify the duplicate
: data, create a new entry for it (rename one of the elements
: so it is distinguishable from the duplicates), and remove
: all duplicate lines.

Is that relevant? The portions of the program that do that much seem to
have been eliminated in your article.

: open (my_file, "$ARGV[0]") || die "ERROR: missing file";
:
: #load up vmi hash to sort and combine duplicate records
: while (<my_file>)
: {
: chomp;
: $a_a = substr ($_, 0, 4);
: $b_b = substr ($_, 12, 11);
: c_c = substr ($_, 24, 2);
: d_d = substr ($_, 54, 4);
: e_e = substr ($_, 59, 2);
: f_f = substr ($_, 62, 2);
: g_g = substr ($_, 46, 7);

I guess there are supposed to be a few more '$' sigils on the LHS of
those assignments.

Consider Perl's unpack() function as an alternative to substr() for
plucking fixed-width fields from a record. That might go like:

    my @fields =
        unpack 'A4 x8 A11 x1 A2 x20 A7 x1 A4 x1 A2 x1 A2', $_;

: $forcombo{"$b_b$d_d$e_e$f_f"} =
: $forcombo{"$b_b$d_d$e_e$f_f"}."^"."$a_a$b_b$c_c$g_g$d_d$e_e$f_f";
: }

You're using string concatenation in the hash value to mimic an array of
arrays. To get the original fields back later, the program has to burst
the string into records, then pluck the fields out of each record again.
This scheme is terribly fragile, not to mention repetitious.

Using a real array reference for the hash value, then pushing a
reference to the array containing the fields is a much saner approach.

    push @{ $forcombo{ @fields[1, 4, 5, 6] } }, \@fields;
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You'll see Perl's largely ignored "multidimensional hash emulation"
feature being used there (see the entry for ``$;'' in perlvar). The
underscored portion is just like saying:

    $forcombo{join($; , $fields[1], $fields[4], $fields[5], $fields[6])}

That feature exists as a mechanism to mimic complex data structures. It
seems appropriate in this case, since you are concerned more with
collecting similar records together than with having an obsessively
organized data structure.

So in one place I recommend against using string concatenation to mimic
a real data structure, and in the next place I make the exact opposite
recommendation. I'm rather enjoying the apparent paradox.

: close my_file;
:
: @keys = split(/\^/,$forcombo{"$b_bd_de_ef_f"});
                                ^^^^^^^^^^^^^
I guess there are some more missing '$' sigils in there.

This part of the process should be about iterating over the hash values,
and is most probably where the program is going off its rails. Those
scalars were used in creating the %forcombo hash from the data file
contents, but that step is over, and the scalars' values are stale.
"use strict;" and proper variable scoping prevents this kind of mistake.

: foreach $key (sort(@keys))
: {
: #printf nodupes_file "$key\n";
: $lv_b_b = substr($key, 4 ,11);
: $lv_a_a = substr($key, 0, 4);
: $lv_c_c = substr($key, 15, 2);
: $lv_d_d = substr($key, 24,4);
: $lv_e_e = substr($key, 28,2);
: $lv_f_f = substr($key, 30,2);
: $lv_g_g = substr($key, 17, 7);
: $lv_g_g=~s/ //g;
: - - - more stuff
: }

Scrap that. Iterate over the sorted keys, then iterate over the array
referenced in the value for each key. If, as recommended earlier, the
program has stored each record's fields as an array reference, they can
be immediately recovered by dereferencing the array instead of doing all
that substr() jazz.

    foreach my $key( sort keys %forcombo ) {
        # Insert whatever initialization is needed to process each
        # set of similar records.
        foreach my $record ( @{ $forcombo{$key} } ) {
            my(
                $lv_a_a, $lv_b_b, $lv_c_c, $lv_g_g,
                $lv_d_d, $lv_e_e, $lv_f_f,
            ) = @$record;
            # - - - more stuff
        }
        # Insert whatever steps are performed after a set of
        # similar records has been processed.
    }



Relevant Pages

  • Re: Help with Hash of Hashes
    ... can visualize the data structure I need in my head, ... I solved an identical probelm with a hash array references. ... foreach $crs ...
    (comp.lang.perl.misc)
  • Re: What kind of data type is?
    ... But all objects based on the Objectimplement hash ... > (associative array) data structure, because it appeared to be the most ... we could say that it's a bastard of HTMLCollection. ...
    (comp.lang.javascript)
  • Re: How to count value in a ArrayList
    ... were adding was not already stored in the data structure. ... Doing this with a ArrayList is a lot slower than a hash ... But presumably they use a relatively common technique of having a fixed-sized array of bins that grows as the hash table accumulates items. ... Some implementations use bins that are either arrays or linked lists, while others simply use the next available spot in the hash table. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: How to prevent duplicated entry in array of the hash
    ... if any one knows how to prevent a duplicated entry in array of ... the hash: here is what I need to do. ... # Here I know I should check to see if the keys & value exits ... By their natures, hashes do not allow duplicate keys, so if you use a hash, you don't have to check for duplicates. ...
    (comp.lang.perl.misc)
  • RE: removing duplicate array values
    ... For instance, if an array contains as ... you can bring it up at perldoc -q "duplicate" ... The most elegant solution from the FAQ is to use hash ... @uniquearray = sort keys %tmphash; ...
    (perl.beginners)