Re: Regular Expression



On Tue, 2 Jan 2007, zowtar wrote:

I have the matchs... I want a regular expression for filter only the
url...

MATCH:
href="http://site.com/0,,NEWS39104-EI8090,00.html";
href="javascript:abre('http://site.com/0,,NEWS39104-EI8090,00.html','Gallery39104','660','500','no');"


CODE 01:
href="(?:.*)?((?:ftp|http|https)://(?:[^:/]+)(?::[0-9]{1,5})?(?:/.*)?.+)"

Changing your final .+ to [^'"]+ could help...

return
OK - http://site.com/0,,NEWS39104-EI8090,00.html
ERROR -
http://site.com/0,,NEWS39104-EI8090,00.html','Gallery39104','660','500','no');


CODE 02:
href="(?:.*)?((?:ftp|http|https)://(?:[^:/]+)(?::[0-9]{1,5})?(?:/.*)?.+?)(?:\',\'.*\',\'.*\',\'.*\',\'.*\'\);)?"

You seem to be using a comma for alternation--that is to specify one of
several alternatives (since I can't imagine any HTML fragment that would
list them all in that order separated by commas). HOWEVER, you specify
alternation in a regular expression with | not ,.

return
OK - http://site.com/0,,NEWS39104-EI8090,00.html
ERROR -
http://site.com/0,,NEWS39104-EI8090,00.html','Gallery39104','660','500','no');

If I were writing a regular expression to pluck out the URLs in your
example I'd use:

set RE {(?xi) # an expanded (case insensitive) regexp
(?:https?|ftp):// # protocol
[^"'/"]+ # host, possibly port, or user@pass for ftp
(?:/~?[a-z%0-9,._+?&=/-]+)? # other chars should be urlencoded (%## ...)
}

Note: I put double quotes in the negated character set [^"'/"] twice only
to make my syntax highlighting editor happy...

Michael
.



Relevant Pages

  • attn: regex gurus. can this be done with a regular expression or using a different technique?
    ... I need to switch *whatever* appears before the comma with ... *whatever* appears after the comma. ... If this code revision is not possible with a regular expression, ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Regular Expression Help
    ... | I need to find all occurrences of a comma in a string EXCEPT where it ... | I figured a regular expression would be the way to go, ... | Kind of like the wizard that helps you build a connection string? ... | of various RegEx test programs, but I would like one to help me construct ...
    (microsoft.public.dotnet.languages.vb)
  • Re: RegularExpression Validation for password in ASP.NET
    ... Can any one help me to write a regular expression to validate the password ... 3- Password must begin with an alphabetic character. ... single quote, double quotes, comma). ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: LCD Screens supported by linux
    ... >> delimited set of value, ... and a comma delimited list for horizontal sync. ... places for horizontal sync, ... the point is that you cannot specify something which ...
    (comp.os.linux.hardware)
  • Re: Variable length/precision formats?
    ... IBM had relatively few extensions to standard Fortran 66, ... were either unusually useful or otherwise hard to get around. ... comma to specify a record number in a READ statement. ...
    (comp.lang.fortran)

Loading