Re: need start point for getting html info from web



nephish@xxxxxxx writes:
> i have a small app that i am going to need to get information from a
> few tables on different websites. i have looked at urllib and httplib.
> the sites i need to get data from mostly have this data in tables. So
> that, i think would make it easier. Anyone suggest a good starting
> point for me to find out how to do this, or know of a link to a good
> how-to?

Don't have a link to a howto. But you're halfway there. urllib (and
urllib2) will get HTML text from the websites. Pulling data from it
sort of depends on the nature of the HTML. If it's well-structured
XHTML, you can use your favorite xml library. if it's well structured
HTML, you can try htmllib, but it's pretty primitive. If it's not
well-structured, you can use BeautifulSoup. I've used it to pull data
from tables. The problem with any of this is that your code really
depends on the structure - or lack thereof - of the HTML you're
scraping. If they change it, your code breaks.

<mike
--
Mike Meyer <mwm@xxxxxxxxx> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
.



Relevant Pages

  • Re: my new website - feedback required...
    ... Those websites have succeeded in spite of, rather than because of, any ... And I'd bet that not one user has EVER complained about their HTML not ... Doesn't change the fact they were successful in spite of HTML not ... other sites which are perfectly accessible WITHOUT being W3C compliant. ...
    (uk.net.web.authoring)
  • Re: Suggestions please
    ... I mentioned my OS and browser because websites should work in different ... More people using Windows are turning to Mozilla (or its ... want a real html editor. ...
    (microsoft.public.windowsxp.general)
  • Re: Website Test
    ... As I have mentiond to Barbara de Zoete I will be making massive changes ... Change background colour from a tile to a solid colour ... As I said to Barbara de Zoete I am a newbie to websites. ... need to have some understanding of HTML to properly understand DW. ...
    (alt.html)
  • Re: [opensuse] feature rich website design software for Linux?
    ... got it pretty well) and now is able to work out websites with gedit. ... I think, Quanta+ is like Dreamweaver. ... They are not bed for _designers_ to get in touch with HTML, ... SUSE LINUX Products GmbH, Maxfeldstr. ...
    (SuSE)
  • Re: OT- building a lightwave website
    ... Please don't tell me that with all those websites in the world, ... then take 10 minutes of your life and learn HTML! ... Yes, seriously, a basic understanding of HTML and the 5 or 6 tags you need ... (it's more of a programmers' editor and isn't free, ...
    (comp.graphics.apps.lightwave)