Re: Regular Expressions

From: Tim Van Wassenhove (euki_at_pi.be)
Date: 03/11/04


Date: 10 Mar 2004 23:01:31 GMT

On 2004-03-10, Andrew Bullock <trullock@yahoo.com> wrote:
> Hi
>
> Im REALLY stuck with regular expressions!
>
> Can someone please give me a pointer with this...
>
> Im trying to extract a load of images from a site, which are in hrefs.
> ie. i want to extract the url and image path
>
> typical hrefs would be:
>
>
><a target=_top href="test.html"><img border=0 src="clickme.jpg"></a>
><a href='test.html'> <img src=clickme.jpg></a>
>
> $matchstr =
> '/<a\s+.*?href=[\"\'s]?(.*?)\"?+src=[\"\'s]?(.*?)\"(.*?)\>\<\/a\>/i';
>

Untested, but you can give a try anyway (perl regex)

<a\s+.*?href=[\"\'](.*?)[\"\'].*?\>.*?<img\s+.*?src=[\"\'](.*?)[\"\'].*?\><\/a>

-- 
http://home.mysth.be/~timvw


Relevant Pages

  • Regular Expressions
    ... Im trying to extract a load of images from a site, ... ie. i want to extract the url and image path ... typical hrefs would be: ... as you can see, there may me double quotes, single quotes or no quotes ...
    (alt.php)
  • Re: Saving Web Page Images?
    ... some web pages tack on things to the image path to ... create a function to extract the domain from any URL so you know the base ... domain (actually the second level domain).... ...
    (microsoft.public.vb.general.discussion)