Script Help
From: Kev (karigna_at_verizon.net)
Date: 10/30/03
- Next message: Brian Helterline: "Re: Win::OLE excel function run twice"
- Previous message: Tor Houghton: "Re: Parsing of blocks (e.g. foo { bar })"
- Next in thread: Purl Gurl: "Re: Script Help"
- Reply: Purl Gurl: "Re: Script Help"
- Reply: Jim Gibson: "Re: Script Help"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 30 Oct 2003 21:22:35 GMT
I'm writing a script that is part of a larger script to index a defined list
of websites. The portion that I'm working on is used to find all pages
ending in .htm / .html so that I can search those pages and index them. I
got the script to map out all the links. Can anyone help in eliminating the
non .htm / html links obtained?
#!/usr/bin/perl
use HTML::LinkExtor;
use LWP::Simple;
$base_url = "http://www.cnn.com";
$parser=HTML::LinkExtor->new(undef, $base_url);
$parser->parse(get($base_url))->eof;
@links=$parser->links;
foreach $linkarray(@links)
{
my @element = @$linkarray;
my $elt_type = shift @element;
while (@element)
{
my ($attr_name, $attr_value) = splice(@element, 0, 2);
$seen{$attr_value}++;
}
}
for (sort keys %seen)
{
print $_, "\n";
}
K.
- Next message: Brian Helterline: "Re: Win::OLE excel function run twice"
- Previous message: Tor Houghton: "Re: Parsing of blocks (e.g. foo { bar })"
- Next in thread: Purl Gurl: "Re: Script Help"
- Reply: Purl Gurl: "Re: Script Help"
- Reply: Jim Gibson: "Re: Script Help"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|