Re: how to count and extract images



Joe <dinamo99@xxxxxxxxx> writes:
> start = s.find('<a href="somefile') + len('<a
> href="somefile')
> stop = s.find('">Save File</a></B>',
> start) fileName = s[start:stop]
> and then construct the url with the filename to download the image
> which works fine as cause every image has the Save File link and I can
> count number of images easy the problem is when there is more than image I
> try using while loop downlaod files, wirks fine for the first one but
> always matches the same, how can count and thell the look to skip the fist
> one if it has been downloaded and go to next one, and if next one is
> downloaded go to next one, and so on.

To answer your question, use the first optional argument to find in both
invocations of find:

stop = 0
while end >= 0:
start = s.find('<a href="somefile', stop) + len('<a href="somefile')
stop = s.find('">Save File</a></B>', start)
fileName = s[start:stop]

Now, to give you some advice: don't do this by hand, use an HTML
parsing library. The code above is incredibly fragile, and will break
on any number of minor variations in the input text. Using a real
parser not only avoids all those problems, it makes your code shorter.
I like BeautifulSoup:

soup = BeautifulSoup(s)
for anchor in soup.fetch('a'):
fileName = anchor['href']

to get all the hrefs. If you only want the ones that have "Save File"
in the link text, you'd do:

soup = BeautifulSoup(s)
for link in soup.fetchText('Save File'):
fileName = link.findParent('a')['href']

<mike
--
Mike Meyer <mwm@xxxxxxxxx> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
.



Relevant Pages

  • Re: javascript filename
    ... HTML Document in IE will download it asynchronously. ... After I get the source code, ... >> Does anybody know how can JavaScript get the filename of the file ... > You could try removing script elements with src attributes one at a time ...
    (comp.lang.javascript)
  • Re: [Xnews] Downloading incomplete binaries
    ... I want to download "filename", so I queue the parts and hit f4. ... Xnews stops downloading that part because article 4 is missing, ... though the rest of the part is n the server. ...
    (news.software.readers)
  • Re: How to input location of document on a document itself?
    ... Header/Footer toolbar to appear. ... button and choose "Filename and path" from the menu. ... You'll be shown a template to download from ... Microsoft Office Online named "Building Blocks for inserting File Name and ...
    (microsoft.public.word.pagelayout)
  • Re: ADODB.Stream Problem with IIS6 and IE
    ... IIS issue or not. ... when i trigger this script from a link in an email ... > download is not triggered correctly. ... It seems filename and content type ...
    (microsoft.public.inetserver.iis)