Re: Regexp - alternate match and grouping



Witold Rugowski wrote:

I need to do some grouping in regexp's but data can have different format. I'm trying to gather some data from syslog servers. I'm trying to extract client hostname (from FreeBSD syslog) or client's ip (from Webtrends syslog).

First ones looks like:
Feb 28 00:00:00 HOSTNAME Feb 28 2006 01:00:00 HOSTNAME : %PIX-6-305011 [cut]

And from Webtrends:
WTsyslog[2006-02-26 23:59:59 ip=IP_ADDRESS pri=6] <14>Feb 26 2006 23:59:59: %PIX-6-302016: [cut]

Currently I'm matching it with:
/(?:([\w\d\-\_\.]*) |ip=([.\d]*).*?)(\w{3} \d{2} \d{4} \d{2}:\d{2}:\d{2})[\w\d\-\_\.: ]*?%PIX[and more]/

Yuck yuck yuck. Use the /x modifier to increase the readability of
your regexp.

But this means that $1 or $2 is defined, depending on input data format. Is some better way to do it? Better for me means that $1 always is HOSTNAME or IP address and $2 is always date...

Why? Why are you feeling the need to make this one massive regexp?
You're matching two completely different formats. It makes no sense
that one regexp should be able to handle both.

[untested]

my $date_pat = qr/[a-z]{3} \d{2} \d{2}:\d{2}:\d{2}/;
my ($host_or_ip, $date);
if (/^([\w.-]+) ($date_pat)/){
($host_or_ip, $date) = ($1, $2);
} elsif (/ip=([.\d]+).*?($date_pat)){
($host_or_ip, $date) = ($1, $2);
} else {
die "Unknown format in log file";
}


Paul Lalli

.



Relevant Pages

  • Re: Fast NFA engine anyone?
    ... I feel that the time I will have to spend extracting the ... the matching portion and scan it again to extract the parts I want. ... maybe I missed how DFAs could give me ... better than regexp). ...
    (comp.compilers)
  • Fast NFA engine anyone?
    ... check which one matches and, in case, extract substring from the ... matching text for further processing. ... All regexp are anchored (i.e. ... Is there any work done on comparing NFA and DFA when regexp are ...
    (comp.compilers)
  • Re: Filter string to remove non-utf-8 characters
    ... 'The global property tells the RegExp engine to find ALL matching ... 'Our pattern tells us what to find in the string... ... 'Use the replace function of RegExp to clean the username. ...
    (microsoft.public.scripting.vbscript)
  • Re: Filter string to remove non-utf-8 characters
    ... 'The global property tells the RegExp engine to find ALL matching ... 'Our pattern tells us what to find in the string... ... 'Use the replace function of RegExp to clean the username. ...
    (microsoft.public.scripting.vbscript)
  • Re: Regular Expression confusion
    ... > part) but this is not working, it's matching on way too much. ... The regexp you included doesn't appear to match your stated criteria ... expected on this newsgroup): ...
    (comp.lang.perl.misc)