Re: problem with hash & sort array
From: Jay Tilton (tiltonj_at_erols.com)
Date: 12/31/03
- Next message: Michael Capone: "Re: Time out SSL request?"
- Previous message: James Willmore: "Re: Benchmarking a script that can't be executed form the command line"
- In reply to: uNConVeNtiOnAL: "problem with hash & sort array"
- Next in thread: uNConVeNtiOnAL: "Re: problem with hash & sort array"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Wed, 31 Dec 2003 04:43:47 GMT
[Please be aware of how your news client handles word-wrapping of long
lines. The quoted text below has been reformatted.]
uNConVeNtiOnAL <tomcat@visi.com> wrote:
: I am trying to read from an file and put the lines into a
: hash. Then I put the hash into an array with the sort
: command. This sort will put the array into such order that
: I can see if duplicate lines occur and add their numeric
: total field together. I will add the combined data as new
: hash entries and then remove the original lines that were
: duplicates.
:
: I don't seem to be putting any values in the hash
What sequence of debugging steps leads you to that conclusion?
: before
: you go off on me, this code is very similar to code that is
: working. The twist is I have to identify the duplicate
: data, create a new entry for it (rename one of the elements
: so it is distinguishable from the duplicates), and remove
: all duplicate lines.
Is that relevant? The portions of the program that do that much seem to
have been eliminated in your article.
: open (my_file, "$ARGV[0]") || die "ERROR: missing file";
:
: #load up vmi hash to sort and combine duplicate records
: while (<my_file>)
: {
: chomp;
: $a_a = substr ($_, 0, 4);
: $b_b = substr ($_, 12, 11);
: c_c = substr ($_, 24, 2);
: d_d = substr ($_, 54, 4);
: e_e = substr ($_, 59, 2);
: f_f = substr ($_, 62, 2);
: g_g = substr ($_, 46, 7);
I guess there are supposed to be a few more '$' sigils on the LHS of
those assignments.
Consider Perl's unpack() function as an alternative to substr() for
plucking fixed-width fields from a record. That might go like:
my @fields =
unpack 'A4 x8 A11 x1 A2 x20 A7 x1 A4 x1 A2 x1 A2', $_;
: $forcombo{"$b_b$d_d$e_e$f_f"} =
: $forcombo{"$b_b$d_d$e_e$f_f"}."^"."$a_a$b_b$c_c$g_g$d_d$e_e$f_f";
: }
You're using string concatenation in the hash value to mimic an array of
arrays. To get the original fields back later, the program has to burst
the string into records, then pluck the fields out of each record again.
This scheme is terribly fragile, not to mention repetitious.
Using a real array reference for the hash value, then pushing a
reference to the array containing the fields is a much saner approach.
push @{ $forcombo{ @fields[1, 4, 5, 6] } }, \@fields;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You'll see Perl's largely ignored "multidimensional hash emulation"
feature being used there (see the entry for ``$;'' in perlvar). The
underscored portion is just like saying:
$forcombo{join($; , $fields[1], $fields[4], $fields[5], $fields[6])}
That feature exists as a mechanism to mimic complex data structures. It
seems appropriate in this case, since you are concerned more with
collecting similar records together than with having an obsessively
organized data structure.
So in one place I recommend against using string concatenation to mimic
a real data structure, and in the next place I make the exact opposite
recommendation. I'm rather enjoying the apparent paradox.
: close my_file;
:
: @keys = split(/\^/,$forcombo{"$b_bd_de_ef_f"});
^^^^^^^^^^^^^
I guess there are some more missing '$' sigils in there.
This part of the process should be about iterating over the hash values,
and is most probably where the program is going off its rails. Those
scalars were used in creating the %forcombo hash from the data file
contents, but that step is over, and the scalars' values are stale.
"use strict;" and proper variable scoping prevents this kind of mistake.
: foreach $key (sort(@keys))
: {
: #printf nodupes_file "$key\n";
: $lv_b_b = substr($key, 4 ,11);
: $lv_a_a = substr($key, 0, 4);
: $lv_c_c = substr($key, 15, 2);
: $lv_d_d = substr($key, 24,4);
: $lv_e_e = substr($key, 28,2);
: $lv_f_f = substr($key, 30,2);
: $lv_g_g = substr($key, 17, 7);
: $lv_g_g=~s/ //g;
: - - - more stuff
: }
Scrap that. Iterate over the sorted keys, then iterate over the array
referenced in the value for each key. If, as recommended earlier, the
program has stored each record's fields as an array reference, they can
be immediately recovered by dereferencing the array instead of doing all
that substr() jazz.
foreach my $key( sort keys %forcombo ) {
# Insert whatever initialization is needed to process each
# set of similar records.
foreach my $record ( @{ $forcombo{$key} } ) {
my(
$lv_a_a, $lv_b_b, $lv_c_c, $lv_g_g,
$lv_d_d, $lv_e_e, $lv_f_f,
) = @$record;
# - - - more stuff
}
# Insert whatever steps are performed after a set of
# similar records has been processed.
}
- Next message: Michael Capone: "Re: Time out SSL request?"
- Previous message: James Willmore: "Re: Benchmarking a script that can't be executed form the command line"
- In reply to: uNConVeNtiOnAL: "problem with hash & sort array"
- Next in thread: uNConVeNtiOnAL: "Re: problem with hash & sort array"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|