Re: Use of hashes and speed - suggestions ?



jack@xxxxxxxxxxxxxxxxxxxxx wrote:
> xhos...@xxxxxxxxx wrote:
> > "Smitty" <jack.s.smith@xxxxxxxxxxxx> wrote:
> > > I have a requirement to parse a very large log file, and extract a
> > > variety of data.
> >
> > Some people consider 10 Meg to be very large, and some people consider
> > 20 Gig to still be medium.
>
> This script will be processing about 5 Gig of log files (broken down
> into 100M chunks) per day, so I guess that's that insignificant, but
> perhaps not very large either.

So the hashes will only accumulate over the 100M chunks, and will span
all 5 Gig at one time? In that case, a modern server should be OK.
> > > ${map{$1}} = $2;
> > > ${xref{$2}} = $1;
> >
> > Why the extra curlies? $map{$1}=$2 looks much nicer.
>
> It seems to me I read somewhere that this was 'safer' for some reason;
> I immediately adopted the syntax, while simultaneaously forgetting the
> reason why. Is it necessary or not ?

In this situation it is not necessary. I find it confusing, because I
initially read it as ${$xref{$2}} and was trying to figure out why you
were introducing a useless layer of scalar references. I can't think of
a situation where your usage is necessary, but there may be one.


> > > {
> > > $_ =~ /...Create object (key).../;
> > > if($1)
> >
> > Um, no. An unsuccessful match does not undef $1, it leaves it at the
> > previous value. You need to test the success of the m// operator
> > itself.
>
> Oh crap. !!!
> How many places in my other scripts have I done that !!!!!!!!!
>
> So, the matching returns an array like:
> my ($key) = ($_ =~ /...Create object (key).../);
>
> so I test $key ???

What if $key is the '0' or the empty string? You would have to test the
definedness of key, rather than it's truth/false value.

>
> or is there a preferred method.

My prefered method is

if (/...Create object (key).../) {
# do something with $1


> > > {
> > > ${created_map{$1}} = ${map{$1}} ;
> >
> > Since you can look up $map{$some_key_from_created_map} at a later time,
> > why store that value here as well as there? It just wastes memory.
> > $created_map{$1}=();
>
> I guess you mean store a null in the 'created_map' hash. Yes, good
> idea, thanks
>
> > > } else {
> > >
> > > $_ =~ /...Processed object (value).../;
> > > if($1)
> > > {
> > > ## get the key from the value
> > > my $key = ${xref{$1}};
> > > if( ${created_map{$key}} )
> >
> > if( exists ${created_map{$key}} )
> >
> > > {
> > > ## if we created it, count it
> > > $counter ++;
> > > }
> > > if( $counter >= XXXX )
> > > {
> > > ## do the work regarding the creation of the XXXth
> > > object
> > > }
> > > }
> > > }
> > > }
> >
> > Nowhere here do you use %map in any meaningful way. So you could get
> > rid of it entirely.
>
> Not sure I understand what you are saying. I reference %map within the
> same loop that the else is a part of.
>
> Could you explain ?

That I can see, the only place you access %map is to assign it's keys'
values to %created_map. But then you never use the values in %created_map,
only the existence of the keys. In that case, there is no need to assign
values to %created_map. In which case, there is no need to have %map in
the first place. If you do use the values of %map (or %created_map) in
some part of the code that was elided for brevity, then you do of course
need %map.


>
> > > and
> > > can anyone suggest a better 'perlish' way which could help me acheive
> > > the same results with better performance?
> >
> > It is hard to get more perlsish than hashes.
>
> Well, I was wondering about retrieving the list of values from the
> hash, rather than creating a seperate hash, soes perl return a
> reference to the existing values or a new list of values ?

I'm sorry, I don't understand. By list of values, do you mean an hash
slice? Or a Hash of Arrayrefs? I don't immediately see how your code could
be improved be the use of either one of those. (Well, unless there is not
a one-to-one corresponce between "key" and "value", in which case your
current code is broken so it is not merely a performance issue.) Pretty
much everything in your current code deals in a scalar context, so when you
talk about lists, I assume that refers to some alternative code you have in
mind but haven't shown?

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
.



Relevant Pages

  • Re: Use of hashes and speed - suggestions ?
    ... > One of the things I need to do is build a cross reference map from one ... and for this I guess I use a hash. ... > log file that might look something like this ... hash lookups are constant time: ...
    (comp.lang.perl.misc)
  • Re: Use of hashes and speed - suggestions ?
    ... >> I have a requirement to parse a very large log file, ... >> hash with the key being the value and the value being the key from ... Well, actually, the first bit, filling the 'map' hash, will have a ... I reference %map within the ...
    (comp.lang.perl.misc)
  • Re: new benchmark results for 8 CL implementations in cliki.net
    ... >> association map? ... Lists are faster if the maps are small. ... Hash tables have ... Lisp code once that actually started with lists, and then as the map grew ...
    (comp.lang.lisp)
  • Re: new benchmark results for 8 CL implementations in cliki.net
    ... Lists are faster if the maps are small. ... Hash tables have ... results much like your benchmarks, where one is better on some benchmarks ... You either implement the map as a list or as a hash table. ...
    (comp.lang.lisp)
  • Re: new benchmark results for 8 CL implementations in cliki.net
    ... Lists are faster if the maps are small. ... Hash tables have ... > results much like your benchmarks, where one is better on some benchmarks ... > You either implement the map as a list or as a hash table. ...
    (comp.lang.lisp)