Re: perl multithreading performance



On Wed, 27 Aug 2008 12:59:36 -0700 (PDT) dniq00@xxxxxxxxx wrote:

d> What the script does is for each line it checks if the line contains
d> GET request, and if it does - goes through a list of pre-compiled
d> regular expressions, trying to find a matching one. Once the match is
d> found - it uses another regexp, associated with the found match, which
d> is a bit more complex, to extract data from the line. I have split it
d> in two separate matches, because about 30% of all lines will match,
d> and I don't want to run that complex regexp to extract data for all
d> the lines I know won't match. The goal is to count how many lines
d> matched for every specific regexp, and the end result is built as a
d> hash, having data, extracted from the line with second regexp, used as
d> hash keys, and the value is the number of matches.

d> Anyway, currently all this is done in a single process, which parses
d> approx. 30000 lines per second. The CPU usage for this process is
d> 100%, so the bottleneck is in the parsing part.
....
d> Everything works fine, HOWEVER! It's all so damn slow! It's only 10%
d> faster than single-process script, consumes about 2-3 times more
d> memory and about as much times more CPU.
....
d> Any ideas why in the world it's so slow? I did some research and
d> couldn't find a lot of info, other than the way I do it pretty much
d> the way it should be done, unless I'm missing something...

You may be hitting the limits of I/O. Try feeding your script
pre-canned data from memory in a loop and see if that improves
performance. It also depends on what kind of processing you are doing
on input lines.

Also, check out the swatch log file monitor, it may do what you need
already.

Ted
.



Relevant Pages

  • perl multithreading performance
    ... What the script does is for each line it checks if the line contains ... to extract data from the line. ... and I don't want to run that complex regexp to extract data for all ... I read data from logs like this: ...
    (comp.lang.perl.misc)
  • [CFT][RFC] Module auto-unloading solution.
    ... I put together the script at the end of this ... It won't try to unload ethernet drivers ... ## your kernel modules. ... # Create a regexp of ethernet modules. ...
    (Linux-Kernel)
  • Re: Filter and manipulate sections of file
    ... regexp A that opens a section, ... The second objective is to create a script that instead of printing ... I know bash and perl can easily handle this and in-fact I plan to ... >>> I have two objectives in mind. ...
    (comp.unix.shell)
  • Re: Filter and manipulate sections of file
    ... I have two objectives in mind. ... regexp A that opens a section, ... The second objective is to create a script that instead of printing ... I know bash and perl can easily handle this and in-fact I plan to ...
    (comp.unix.shell)
  • Re: unicode (hebrew) regexp search for new line headaches
    ... > slurp in a utf8 encoded hebrew text file ... > "from the beginning of the line just before the start of the regexp ... > Now this script works on individual files. ... Are you sure you're opening those files in UTF8 mode? ...
    (comp.lang.perl.misc)