Re: Use of hashes and speed - suggestions ?
- From: xhoster@xxxxxxxxx
- Date: 08 Nov 2005 17:22:20 GMT
jack@xxxxxxxxxxxxxxxxxxxxx wrote:
> xhos...@xxxxxxxxx wrote:
> > "Smitty" <jack.s.smith@xxxxxxxxxxxx> wrote:
> > > I have a requirement to parse a very large log file, and extract a
> > > variety of data.
> >
> > Some people consider 10 Meg to be very large, and some people consider
> > 20 Gig to still be medium.
>
> This script will be processing about 5 Gig of log files (broken down
> into 100M chunks) per day, so I guess that's that insignificant, but
> perhaps not very large either.
So the hashes will only accumulate over the 100M chunks, and will span
all 5 Gig at one time? In that case, a modern server should be OK.
> > > ${map{$1}} = $2;
> > > ${xref{$2}} = $1;
> >
> > Why the extra curlies? $map{$1}=$2 looks much nicer.
>
> It seems to me I read somewhere that this was 'safer' for some reason;
> I immediately adopted the syntax, while simultaneaously forgetting the
> reason why. Is it necessary or not ?
In this situation it is not necessary. I find it confusing, because I
initially read it as ${$xref{$2}} and was trying to figure out why you
were introducing a useless layer of scalar references. I can't think of
a situation where your usage is necessary, but there may be one.
> > > {
> > > $_ =~ /...Create object (key).../;
> > > if($1)
> >
> > Um, no. An unsuccessful match does not undef $1, it leaves it at the
> > previous value. You need to test the success of the m// operator
> > itself.
>
> Oh crap. !!!
> How many places in my other scripts have I done that !!!!!!!!!
>
> So, the matching returns an array like:
> my ($key) = ($_ =~ /...Create object (key).../);
>
> so I test $key ???
What if $key is the '0' or the empty string? You would have to test the
definedness of key, rather than it's truth/false value.
>
> or is there a preferred method.
My prefered method is
if (/...Create object (key).../) {
# do something with $1
> > > {
> > > ${created_map{$1}} = ${map{$1}} ;
> >
> > Since you can look up $map{$some_key_from_created_map} at a later time,
> > why store that value here as well as there? It just wastes memory.
> > $created_map{$1}=();
>
> I guess you mean store a null in the 'created_map' hash. Yes, good
> idea, thanks
>
> > > } else {
> > >
> > > $_ =~ /...Processed object (value).../;
> > > if($1)
> > > {
> > > ## get the key from the value
> > > my $key = ${xref{$1}};
> > > if( ${created_map{$key}} )
> >
> > if( exists ${created_map{$key}} )
> >
> > > {
> > > ## if we created it, count it
> > > $counter ++;
> > > }
> > > if( $counter >= XXXX )
> > > {
> > > ## do the work regarding the creation of the XXXth
> > > object
> > > }
> > > }
> > > }
> > > }
> >
> > Nowhere here do you use %map in any meaningful way. So you could get
> > rid of it entirely.
>
> Not sure I understand what you are saying. I reference %map within the
> same loop that the else is a part of.
>
> Could you explain ?
That I can see, the only place you access %map is to assign it's keys'
values to %created_map. But then you never use the values in %created_map,
only the existence of the keys. In that case, there is no need to assign
values to %created_map. In which case, there is no need to have %map in
the first place. If you do use the values of %map (or %created_map) in
some part of the code that was elided for brevity, then you do of course
need %map.
>
> > > and
> > > can anyone suggest a better 'perlish' way which could help me acheive
> > > the same results with better performance?
> >
> > It is hard to get more perlsish than hashes.
>
> Well, I was wondering about retrieving the list of values from the
> hash, rather than creating a seperate hash, soes perl return a
> reference to the existing values or a new list of values ?
I'm sorry, I don't understand. By list of values, do you mean an hash
slice? Or a Hash of Arrayrefs? I don't immediately see how your code could
be improved be the use of either one of those. (Well, unless there is not
a one-to-one corresponce between "key" and "value", in which case your
current code is broken so it is not merely a performance issue.) Pretty
much everything in your current code deals in a scalar context, so when you
talk about lists, I assume that refers to some alternative code you have in
mind but haven't shown?
Xho
--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
.
- References:
- Re: Use of hashes and speed - suggestions ?
- From: xhoster
- Re: Use of hashes and speed - suggestions ?
- From: jack
- Re: Use of hashes and speed - suggestions ?
- Prev by Date: Re: Misbehaving sort function
- Next by Date: Re: Misbehaving sort function
- Previous by thread: Re: Use of hashes and speed - suggestions ?
- Next by thread: Misbehaving sort function
- Index(es):
Relevant Pages
|