Re: need start point for getting html info from web
- From: Mike Meyer <mwm@xxxxxxxxx>
- Date: Sun, 30 Oct 2005 21:36:49 -0500
nephish@xxxxxxx writes:
> i have a small app that i am going to need to get information from a
> few tables on different websites. i have looked at urllib and httplib.
> the sites i need to get data from mostly have this data in tables. So
> that, i think would make it easier. Anyone suggest a good starting
> point for me to find out how to do this, or know of a link to a good
> how-to?
Don't have a link to a howto. But you're halfway there. urllib (and
urllib2) will get HTML text from the websites. Pulling data from it
sort of depends on the nature of the HTML. If it's well-structured
XHTML, you can use your favorite xml library. if it's well structured
HTML, you can try htmllib, but it's pretty primitive. If it's not
well-structured, you can use BeautifulSoup. I've used it to pull data
from tables. The problem with any of this is that your code really
depends on the structure - or lack thereof - of the HTML you're
scraping. If they change it, your code breaks.
<mike
--
Mike Meyer <mwm@xxxxxxxxx> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
.
- Follow-Ups:
- Re: need start point for getting html info from web
- From: nephish
- Re: need start point for getting html info from web
- References:
- need start point for getting html info from web
- From: nephish
- need start point for getting html info from web
- Prev by Date: putting a string in Mac Address form
- Next by Date: Re: need start point for getting html info from web
- Previous by thread: need start point for getting html info from web
- Next by thread: Re: need start point for getting html info from web
- Index(es):
Relevant Pages
|