Re: how to count and extract images
- From: Mike Meyer <mwm@xxxxxxxxx>
- Date: Sun, 23 Oct 2005 20:55:21 -0400
Joe <dinamo99@xxxxxxxxx> writes:
> start = s.find('<a href="somefile') + len('<a
> href="somefile')
> stop = s.find('">Save File</a></B>',
> start) fileName = s[start:stop]
> and then construct the url with the filename to download the image
> which works fine as cause every image has the Save File link and I can
> count number of images easy the problem is when there is more than image I
> try using while loop downlaod files, wirks fine for the first one but
> always matches the same, how can count and thell the look to skip the fist
> one if it has been downloaded and go to next one, and if next one is
> downloaded go to next one, and so on.
To answer your question, use the first optional argument to find in both
invocations of find:
stop = 0
while end >= 0:
start = s.find('<a href="somefile', stop) + len('<a href="somefile')
stop = s.find('">Save File</a></B>', start)
fileName = s[start:stop]
Now, to give you some advice: don't do this by hand, use an HTML
parsing library. The code above is incredibly fragile, and will break
on any number of minor variations in the input text. Using a real
parser not only avoids all those problems, it makes your code shorter.
I like BeautifulSoup:
soup = BeautifulSoup(s)
for anchor in soup.fetch('a'):
fileName = anchor['href']
to get all the hrefs. If you only want the ones that have "Save File"
in the link text, you'd do:
soup = BeautifulSoup(s)
for link in soup.fetchText('Save File'):
fileName = link.findParent('a')['href']
<mike
--
Mike Meyer <mwm@xxxxxxxxx> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
.
- References:
- how to count and extract images
- From: Joe
- how to count and extract images
- Prev by Date: Re: Tricky Areas in Python
- Next by Date: Re: Syntax across languages
- Previous by thread: Re: how to count and extract images
- Next by thread: OSDC 2005 Registration
- Index(es):
Relevant Pages
|