Re: Parse text from HTML website, dump into DB

From: Michael Vilain (vilain_at_spamcop.net)
Date: 03/16/04


Date: Tue, 16 Mar 2004 09:46:28 -0800

In article <105cvgrjkbbh8e6@corp.supernews.com>,
 "IceOnFire" <af@iceonfire.net> wrote:

> I am working on a script to extract statistics (which is updated daily) from
> a website, and insert them into a MySQL database. I want to take this
> website:
> http://www.usatoday.com/sports/basketball/nba/stats/allplayers0304.htm
> and strip off all the HTML tags and etc, make it look like
> http://www.enlhoops.com/ratings/parsed.txt
> and then insert each players stat line into the database.
>
> I have begun writing the script, getting the file, striping html tags off,
> but that doesn't seem to work too well. If anyone can help me get started,
> suggest a function or anything else, that would be helpful. Thanks.
>
> IceOnFire
>
>

Use perl. It's more suited to this sort of thing and can run
independently from the command line.

CPAN modules allow you to extend perl to access sites as if you were
browser, including accepting cookies.

-- 
DeeDee, don't press that button!  DeeDee!  NO!  Dee...