Re: Use of hashes and speed - suggestions ?
- From: jack@xxxxxxxxxxxxxxxxxxxxx
- Date: 8 Nov 2005 08:40:59 -0800
xhos...@xxxxxxxxx wrote:
> "Smitty" <jack.s.smith@xxxxxxxxxxxx> wrote:
> > I have a requirement to parse a very large log file, and extract a
> > variety of data.
>
> Some people consider 10 Meg to be very large, and some people consider
> 20 Gig to still be medium.
This script will be processing about 5 Gig of log files (broken down
into 100M chunks) per day, so I guess that's that insignificant, but
perhaps not very large either.
> > The main requirement is to discover at what time I have processed XXX
> > of the 'created' key objects.
>
> How do you know which of the created objects are obscene?
Funny. OK 'some number' as represented by XXX
>
> > So I was imagining I would need another
> > hash with the key being the value and the value being the key from
> > above, so I would also have a Xref in the above loop like this.
> >
> > ## process the MAP entries
> > my %map = ();
> > my %xref = ();
> > while(<>)
> > {
> > $_ =~ /...(key).is a ..(value).../;
> > ${map{$1}} = $2;
> > ${xref{$2}} = $1;
>
> Why the extra curlies? $map{$1}=$2 looks much nicer.
It seems to me I read somewhere that this was 'safer' for some reason;
I immediately adopted the syntax, while simultaneaously forgetting the
reason why. Is it necessary or not ?
>
> > }
> >
> > ## process the 'created' and 'processed' entries
> > my $counter = 0;
> > my %created_map = ();
> > while(<>)
>
> I hope you reset the <> iterator somewhere.
Well, actually, the first bit, filling the 'map' hash, will have a
'last' in it somewhere, but I neglected to mention it, since it really
isn't that important to the main issue.
>
> > {
> > $_ =~ /...Create object (key).../;
> > if($1)
>
> Um, no. An unsuccessful match does not undef $1, it leaves it at the
> previous value. You need to test the success of the m// operator itself.
Oh crap. !!!
How many places in my other scripts have I done that !!!!!!!!!
So, the matching returns an array like:
my ($key) = ($_ =~ /...Create object (key).../);
so I test $key ???
or is there a preferred method.
>
>
> > {
> > ${created_map{$1}} = ${map{$1}} ;
>
> Since you can look up $map{$some_key_from_created_map} at a later time, why
> store that value here as well as there? It just wastes memory.
> $created_map{$1}=();
I guess you mean store a null in the 'created_map' hash. Yes, good
idea, thanks
> > } else {
> >
> > $_ =~ /...Processed object (value).../;
> > if($1)
> > {
> > ## get the key from the value
> > my $key = ${xref{$1}};
> > if( ${created_map{$key}} )
>
> if( exists ${created_map{$key}} )
>
> > {
> > ## if we created it, count it
> > $counter ++;
> > }
> > if( $counter >= XXXX )
> > {
> > ## do the work regarding the creation of the XXXth
> > object
> > }
> > }
> > }
> > }
>
> Nowhere here do you use %map in any meaningful way. So you could get
> rid of it entirely.
Not sure I understand what you are saying. I reference %map within the
same loop that the else is a part of.
Could you explain ?
> > and
> > can anyone suggest a better 'perlish' way which could help me acheive
> > the same results with better performance?
>
> It is hard to get more perlsish than hashes.
Well, I was wondering about retrieving the list of values from the
hash, rather than creating a seperate hash, soes perl return a
reference to the existing values or a new list of values ?
.
- Follow-Ups:
- Re: Use of hashes and speed - suggestions ?
- From: xhoster
- Re: Use of hashes and speed - suggestions ?
- References:
- Re: Use of hashes and speed - suggestions ?
- From: xhoster
- Re: Use of hashes and speed - suggestions ?
- Prev by Date: Re: why the perl documents is hard to understand?
- Next by Date: Re: Misbehaving sort function
- Previous by thread: Re: Use of hashes and speed - suggestions ?
- Next by thread: Re: Use of hashes and speed - suggestions ?
- Index(es):
Relevant Pages
|