RE: extract web pages from a web site
- From: info@xxxxxxxxxxxx (Siegfried Heintze)
- Date: Thu, 15 Sep 2005 10:46:04 -0600
I recommend Lincoln Stein's book "Perl Networking".
Even if you are too cheap to buy his book, you can google for it and
download the source code for an example program that uses HTML::Parser to
extract and download all the gif files from a page. His example actually
parses the HTML and it sounds like you are not interested in that part.
I looked at WWW::Mechanize and was dismayed because it looked like it was
extremely specific. It only had a few functions and was not general purpose.
Siegfried
-----Original Message-----
From: Scott R. Godin [mailto:nospam@xxxxxxxxxxxxx]
Sent: Wednesday, September 14, 2005 9:33 PM
To: beginners@xxxxxxxx; jose.pinto@xxxxxxxxxx
Subject: Re: extract web pages from a web site
José Pedro Silva Pinto wrote:
> Hi there,
>
> I am doing a program in perl to extract some web pages (And copy it to a
local file), from a given web address.
>
> Which perl module can I use to help me to do this task
It depends on what you're looking to do...
LWP::Simple to grab stuff with, WWW::Mechanize and HTML::TokeParser or
HTML::Parser.. to interact with it and pick apart the results..
If you simply want to download and store the webpage wouldn't you also want
to
store the attendant image/css/javascript/embedded files that it references
externally ?
--
To unsubscribe, e-mail: beginners-unsubscribe@xxxxxxxx
For additional commands, e-mail: beginners-help@xxxxxxxx
<http://learn.perl.org/> <http://learn.perl.org/first-response>
.
- Follow-Ups:
- Where is the data I just inserted?
- From: Siegfried Heintze
- Re: extract web pages from a web site
- From: Scott R. Godin
- Where is the data I just inserted?
- References:
- Re: extract web pages from a web site
- From: Scott R. Godin
- Re: extract web pages from a web site
- Prev by Date: keep entity references while parsing with XML::Parser
- Next by Date: Assistance needed with script.......
- Previous by thread: Re: extract web pages from a web site
- Next by thread: Re: extract web pages from a web site
- Index(es):
Relevant Pages
|