Re: HOW TO PARSE A VAST FILE!

From: news.hinet.net (luke_at_program.com.tw)
Date: 03/18/04


Date: Fri, 19 Mar 2004 01:05:53 +0800

It is not apache log or standard logfile.

logfile:

163.22.3.7 2003/06/28 0011:00:42 PASSED
http://proxy.ncnu.edu.tw/cgi-bin/squidGuard.cgi?clientaddr=163.22.28.128&clientname=ip128.puli28.ncnu.edu.tw&clientuser=&clientgroup=general-clients&targetgroup=moe&url=http://gatorcme.gator.com/gatorcme/autoupdate/precisiontime.ini
163.22.3.7 2003/06/28 0011:00:42 PASSED
http://163.22.3.7/gatorcme/autoupdate/precisiontime.ini
163.22.3.7 2003/06/28 0011:00:43 PASSED
http://proxy.ncnu.edu.tw/cgi-bin/squidGuard.cgi?clientaddr=163.22.9.25&clientname=ip025.puli09.ncnu.edu.tw&clientuser=&clientgroup=general-clients&targetgroup=moe&url=http://gatorcme.gator.com/gatorcme/autoupdate/installdatemanager.exe
163.22.3.7 2003/06/28 0011:00:43 PASSED
http://163.22.3.3/gatorcme/autoupdate/installdatemanager.exe
218.172.162.134 2003/06/28 0011:00:44 PASSED
http://liveupdate.symantecliveupdate.com/autoupdt.trg
218.172.162.134 2003/06/28 0011:00:44 PASSED
http://202.239.172.95/autoupdt.trg

i want to find ip(163.22.3.7 218.172.162.134 ...) that are not exist in
mysql.
if it is exist in db then reject or insert into db.
the procedure only do this job.

please help!

"Peter Hickman" <peter@semantico.com> ???
news:4059d370$0$2565$afc38c87@news.easynet.co.uk ???...
> news.hinet.net wrote:
>
> > I must parse 500M's logfile!
> > How to do this job while speed up.
> > Someone can give me good idea for this job.
> > Split file to small one and use fork to do that or
> > other method can make good result.
> >
> >
> > ps: its spend 3 hours to complete now!
>
> Ok this is trumpet blowing but if the log file is an apache log file
> then there is Apache::LogRegex. Which processes a logfile and returns
> you a hash of the data on each line. Pretty fast (if I say so myself)
> and available on CPAN.



Relevant Pages

  • Proper way to change the rotation of apache logs?
    ... I would like to adjust the amount of space each apache log can take up and how many copies are saved. ... According to RH in this kbase article ... They even have a listing in the article for an httpd log file. ...
    (RedHat)
  • Re: Perl module for analyzing log files?
    ... > Can anyone recommend a module that I can use to break apart Apache log ... We are doing out log file analysis on a separate machine and I ...
    (comp.lang.perl.misc)
  • Perl module for analyzing log files?
    ... Can anyone recommend a module that I can use to break apart Apache log ... We are doing out log file analysis on a separate machine and I ...
    (comp.lang.perl.misc)