Re: regexp pipe problems..



willwade@xxxxxxxxx schrieb:
> OK, I have a "fairly" straightforward regular expression which grabs
> some bits out of a url. Now this should be easy but I can't see the
> wood for the trees. I have a few or's in there but it seems to be
> adding each to memory - when I only want the found one. Also - instead
> of matching just the subdomain it matches the whole domain ($1 see
> below) - whats that all about?
>
> $url = 'http://sub1.site3.org/slash/slashey2/slashey3/39/4/223';
>
> if ($url =~ m{(([^/]+).site1.org|([^/]+).site2.org|([^/]+).site3.org)
> /slash/slashey2/([^/]+)/([0-9]+)/([0-9]+)/([0-9a-zA-Z-]+)}){
> print "yay! $1,$2,$3,$4,$5";
> } else {
> print "poo";
> exit;
> }
>
> and it prints:
>
> yay! sub1.site3.org,,,sub1,

Just make it a little more readable:

m{
( # this parenthesis is captured in $1
([^/]+).site1.org | # here's $2
([^/]+).site2.org | # that's $3
([^/]+).site3.org # and $4
)
/slash/slashey2/
( [^/]+ ) # here we get $5...
/
( [0-9]+ ) # ...and so on...
/
( [0-9]+ )
/
( [0-9a-zA-Z-]+ )
}x;

With the /x modifier you can insert whitespaces, comments
and linebreaks to your regex without changing their meaning.
This is especially useful when you post regexes here,
as your expressions are no longer damaged by automated line
breaks.

Capturing parens are numbered in the order the opening parens
appears in the regex from left to right, that's why your
example captures the fqdn.

So to work around this the easiest solution would be to
move the hostname pattern outside of the or-clause:
m{
([^/]+)\.(site1|site2|site3)\.org
/slash/slashey2/
( [^/]+ ) / ( \d+ ) / ( \d+ ) / ( [0-9a-zA-Z-]+ )
}x;

and to change the print statement (or whatever uses the
captering variables) to ignore $2:

print "yay! $1,$3,$4,$5,$6";

But, of course, TMTOWTDI, and "perldoc perle" has a lot of
useful information on regular expressions.

HTH
-Chris
.



Relevant Pages

  • Re: Search for multiple things in a string
    ... >> I also feel that Regular Expressions, being an object in asp.net (not ... So using Regex is not really like using another language (as C# is different ... I agree with you that readability is important. ... And I was not saying experiment with it. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: for a laught (???)
    ... Regex doesn't work too well with a null byte delimiter :-) ... the API for a particular form of regular expressions ... Regex doesn't work with null terminated strings. ... qualifier or the qualifier "commonly" might have suggested. ...
    (comp.lang.cobol)
  • Re: Search for multiple things in a string
    ... >>> As far as readability, it has nothing to do with Regular Expressions ... > and Regex. ... >> characters in the string, perhaps even writing your own state machine ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: I need a test to see that a string is a valid path to a file
    ... I need a test to see that a string is a valid path to a file. ... "does point to an existing file": you can't check that with a regex. ... of folder names, but I could add that for the test. ... I don't know anything about regular expressions so I looked on the ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: [OT] Re: Chris Sonnack on VB.Nets putative Set statement
    ... > regex is in a C-ish language string. ... the ease of making errors in regular expressions is a concern ... You do need to replicate the pattern on either side of the comma ...
    (comp.programming)