Intending to Upload: WWW::Spider
- From: bytbox <bytbox@xxxxxxxxx>
- Date: Sun, 5 Jul 2009 00:49:12 -0700 (PDT)
I intend to upload to CPAN the WWW::Spider module whose header is
shown below. I am looking for comments/complaints/ideas about the
namespace and/or usefulness of this module (e.g., any functionality
missing or not necessary).
Thanks to all
package WWW::Spider;
=head1 NAME
WWW::Spider - customizable internet spider
=head1 VERSION
This document describes C<WWW::Spider> version 0.01_01
=head1 SYNOPSIS
#configuration
my $spider=new WWW::Spider;
$spider=new WWW::Spider({UASTRING=>"mybot"});
print $spider->uastring;
$spider->uastring('New UserAgent String');
$spider->user_agent(new LWP::UserAgent);
#basic stuff
print $spider->get_page_response('http://search.cpan.org/')->content;
print $spider->get_page_content('http://search.cpan.org/');
$spider->get_links_from('http://google.com/');#get array of URLs
my $graph=$spider->create_graph_for('http://perl.org');
=head1 DESCRIPTION
WWW::Spider is a customizable Internet spider intended to be used for
fetching and analyzing websites. Features include:
=over
=item * basic methods for high-level html handling
=item * the manner in which pages are retrieved is customizable
=item * callbacks for when pages are fetched, errors caused, etc...
=item * caching
=item * a high-level implementation of a 'graph' of either pages or
sites (as defined by the callback) which can be analyzed
=back
.
- Follow-Ups:
- Re: Intending to Upload: WWW::Spider
- From: Uri Guttman
- Re: Intending to Upload: WWW::Spider
- Next by Date: Re: Filter mime/multipart E-Mail message to text/plain
- Next by thread: Re: Intending to Upload: WWW::Spider
- Index(es):