Regular Expressions

From: Andrew Bullock (trullock_at_yahoo.com)
Date: 03/10/04


Date: Wed, 10 Mar 2004 14:32:01 -0000

Hi

Im REALLY stuck with regular expressions!

Can someone please give me a pointer with this...

Im trying to extract a load of images from a site, which are in hrefs.
ie. i want to extract the url and image path

typical hrefs would be:

<a href="test.html"><img src="clickme.jpg"></a>
<a target=_top href="test.html"><img border=0 src="clickme.jpg"></a>
<a href='test.html'> <img src=clickme.jpg></a>

as you can see, there may me double quotes, single quotes or no quotes
surrounding the image and url path, so i need to be able to account for
this.

Also i only want to return info, where there is a jpeg as an image to click,
and the url is a html.

Does that make sense?

$matchstr =
'/<a\s+.*?href=[\"\'s]?(.*?)\"?+src=[\"\'s]?(.*?)\"(.*?)\>\<\/a\>/i';

That almost works, but it returns this :

test.html"><img
thumbs/16.jpg>

how can i fix this?

Many thanks in advance

Andrew



Relevant Pages

  • Re: Regular Expressions
    ... > Can someone please give me a pointer with this... ... > Im trying to extract a load of images from a site, ... > ie. i want to extract the url and image path ... > typical hrefs would be: ...
    (alt.php)
  • Re: problem with spaces in quoted string arguments
    ... Janis Papanagnou wrote: ... are in double quotes. ... As can be seen in the output, the server called "photon hub" did not ... extract properly, since the space was detected in the argument to awk. ...
    (comp.unix.shell)
  • Re: Problem with Importing CSV with "=" inside a field value
    ... is a small Access database ... and have overcome a number of obstacles. ... it has an equal sign at the front and wrapped by quotes. ... Any way to skip parsing this field such that Access can extract the fields ...
    (microsoft.public.access.externaldata)
  • Re: Problem with Importing CSV with "=" inside a field value
    ... and have overcome a number of obstacles. ... it has an equal sign at the front and wrapped by quotes. ... This problematic field actually is of no use to me. ... Any way to skip parsing this field such that Access can extract the fields ...
    (microsoft.public.access.externaldata)
  • Re: Extract until unquote or EOL
    ... > I wan't to extract the phrase/text between the two quotes. ... NAME = no quotation marks so grab all of this ... NAME = "solitary quotation mark at the beginning of line, so grab all ...
    (comp.lang.perl.misc)