Re: Combining multiple hash references into one hash reference



xhoster@xxxxxxxxx wrote:

Arvin Portlock  wrote:

>my %newhash = (%$hash1, %$hash2);
>
># The following do not work:
># my $newhash = { $hash1, $hash2 };
># my $newhash = [ $hash1, $hash2 ];
># my %newhash = ( $hash1, $hash2 );

Maybe this is more to your liking:
my $newhash = { %$hash1, %$hash2 };

That dereferencing % there makes me nervous. It looks to me like a new hash is being created and then a reference to it is being assigned to $newhash. So for each of the thousands of instances of $newhash, each will have (a ref- erence to) its very own copy of %hash1 and %hash2.

$hash1 and $hash2 do not go out of scope. Also they are not
the only hashes. There are typically 20 or 30 of them
throughout the life of the program. Each XML element contains
some combination from among those 20 or 30. E.g.,

$newhash1 = { %$hash40, %$hash31, %$hash12 };
$newhash10012 = { %$hash1, %$hash21, %$hash26 };

In my XML document (METS for the curious), There are some
30 or forty elements at the top of the document, then further
down there are some thousands that reference some of those
elements with attributes of type IDREFS.

<element id="id1"> ... </element>
<element id="id2"> ... </element>
....
<element id="id30"> ... </element>

<refelement ids="id1 id6 id21"/>
<refelement ids="id22 id11 id21"/>
.... etc. for thousands of <refelements>

Each of the thousands of elements are stored in an array.
At the end of the program I will loop through each of the
thousands of elements and extract certain values from each
one. I want to be able to extract those values by a key
name (which is why an array won't quite work as I can't
access the elements efficiently by key name).

BTW, the above was only an attempt to simplify the problem.
In reality of course I won't be naming my hashes %hash1,
%hash2, etc. Nor will I name the reference elements
$newhash12, $newhash643, etc. The <elements> will live
in a small hash keyed by the id. The <refelement>s will
live in a large array. And I want to be able to write
things like this:

foreach my $refelement (@bigarray) {
   print $refelement->{size}, "\n";
   print $refelement->{type}, "\n";
}

Where "size" and "type" are typical keys from
among the original 20 or 30 elements (assuming
<refelement ids="id1 id6 id21"/>, "size" may be
a key from the element referenced by "id1", "type"
may come from "id21", and so on.

I'm trying to simplify the problem without posting the
entire huge program, but this may be a bit closer to
what I want (except it doesn't quite work):

my $hashelements = {
   '1' => {
      'key1' => 'Value 1',
      'key2' => 'Value 2',
      'key3' => 'Value 3'
    },

   '6' => {
      'key4' => 'Value 4',
      'key5' => 'Value 5',
      'key6' => 'Value 6'
    },

   '21' => {
      'key7' => 'Value 7',
      'key8' => 'Value 8',
      'key9' => 'Value 9'
    },
};

my $newhash = {};
foreach my $id (1, 6, 21) {
   foreach my $key (keys %{$hashelements->{$id}}) {
      $newhash->{$key} = \{$hashelements->{$id}->{$key}};
   }
}

foreach my $key (keys %$newhash) {
   print "$key: ", $newhash->{$key}, "\n";
}

That $newhash->{$key} = \{$hashelements->{$id}->{$key}} part
is an attempt to make sure I only create a reference to
the value rather than make a copy of the value itself.

Perhaps using "each" somehow is the answer. Can't quite get
that to work either though.

Arvin




>foreach my $key (keys %newhash) { > print "$key: $newhash{$key}\n"; >}


Presumably, that isn't all you are doing, because if it were you would just use two loops, one for hash1 and one for hash2, and never make the combined hash in the first place. And not making the combined hash in the first place is, of course, the best solution if you can get away with it. If you need more generalized than just those two hashes, then use an AoH with nested loop for printing.


>But I'm concerned I'm creating copies of each of >these elements for all of the thousands of instances >of %newhash I will be creating.


Will $hash1 and $hash2 go out of scope or get redefined shortly after %newhash (or $newhash) is created from them? If so, you most likely needn't worry on the memory front. And will all these thousands of instances of %newhash also be properly scoped?


>Is there a faster and >memory efficient way to do this?


Is this micro-optimization week or something?

Would it be acceptable to add %$hash2 into %$hash1 rather than making
a brand new %newhash?  If so,
@{$hash1}{keys %$hash2}=values %$hash2;
is somewhat more memory efficient.

If not, then:
my %newhash=%$hash1;
undef $hash1;
@newhash{keys %$hash2}=values %$hash2;


Xho


.



Relevant Pages