Re: Regexp - alternate match and grouping
- From: "Paul Lalli" <mritty@xxxxxxxxx>
- Date: 28 Feb 2006 05:23:05 -0800
Witold Rugowski wrote:
I need to do some grouping in regexp's but data can have different format. I'm trying to gather some data from syslog servers. I'm trying to extract client hostname (from FreeBSD syslog) or client's ip (from Webtrends syslog).
First ones looks like:
Feb 28 00:00:00 HOSTNAME Feb 28 2006 01:00:00 HOSTNAME : %PIX-6-305011 [cut]
And from Webtrends:
WTsyslog[2006-02-26 23:59:59 ip=IP_ADDRESS pri=6] <14>Feb 26 2006 23:59:59: %PIX-6-302016: [cut]
Currently I'm matching it with:
/(?:([\w\d\-\_\.]*) |ip=([.\d]*).*?)(\w{3} \d{2} \d{4} \d{2}:\d{2}:\d{2})[\w\d\-\_\.: ]*?%PIX[and more]/
Yuck yuck yuck. Use the /x modifier to increase the readability of
your regexp.
But this means that $1 or $2 is defined, depending on input data format. Is some better way to do it? Better for me means that $1 always is HOSTNAME or IP address and $2 is always date...
Why? Why are you feeling the need to make this one massive regexp?
You're matching two completely different formats. It makes no sense
that one regexp should be able to handle both.
[untested]
my $date_pat = qr/[a-z]{3} \d{2} \d{2}:\d{2}:\d{2}/;
my ($host_or_ip, $date);
if (/^([\w.-]+) ($date_pat)/){
($host_or_ip, $date) = ($1, $2);
} elsif (/ip=([.\d]+).*?($date_pat)){
($host_or_ip, $date) = ($1, $2);
} else {
die "Unknown format in log file";
}
Paul Lalli
.
- Follow-Ups:
- Re: Regexp - alternate match and grouping
- From: Paul Lalli
- Re: Regexp - alternate match and grouping
- From: Witold Rugowski
- Re: Regexp - alternate match and grouping
- References:
- Regexp - alternate match and grouping
- From: Witold Rugowski
- Regexp - alternate match and grouping
- Prev by Date: Re: Uploaded File Empty but in correct folder with correct name
- Next by Date: Re: Regexp - alternate match and grouping
- Previous by thread: Regexp - alternate match and grouping
- Next by thread: Re: Regexp - alternate match and grouping
- Index(es):
Relevant Pages
|