Re: Extracting a table from a webpage
- From: Ben Bullock <benkasminbullock@xxxxxxxxx>
- Date: Mon, 28 Apr 2008 22:06:53 +0000 (UTC)
On Mon, 28 Apr 2008 13:50:57 -0700, googlinggoogler@xxxxxxxxxxx wrote:
I would like to scrape all the values from the tabletab=2&sortby=ReturnM60&lang=en-GB
http://www.morningstar.co.uk/UK/ISAQuickrank/default.aspx?
But im having difficulty getting HTML::TableExtract to achieve this, I
keep returning null values.
It's difficult to analyze your problem without seeing the code you are
using. HTML::TableExtract shouldn't have a problem getting that table
out. I happened to have an old table extracting script lying around,
which I've modified for your case:
#!/usr/bin/perl
use warnings;
use strict;
use HTML::TableExtract;
use LWP::Simple;
my $isafilename = "isa.html";
if (!-f $isafilename) {
my $isaurl = "url goes here";
my $isadata = get($isaurl);
open my $isafile, ">", $isafilename or die $!;
print $isafile $isadata;
close $isafile or die $!;
}
my $te = HTML::TableExtract->new();
$te->parse_file($isafilename);
foreach my $ts ($te->tables) {
print "Table found at ", join(',', $ts->coords), " with ";
print scalar(@{$ts->rows}), " rows\n";
}
This worked correctly for me & found four tables in the page.
The other thing is I want to get all the pages, as you can see from that
page theres something like ~3800 lines in the table.
I have already tried to manipulate my http POST's with the firefox
plugin Tamper Data (great extension, comes highly recommended!) but the
script that serves that page is well written and guards against this. So
I tried to look at the http transfers that cause the "next button" at
the bottom, this has led me to find that it produces an absolutly
massive string, that I can't even begin to understand, plus I think it
uses some sort of validation process based on the field names (e.g.
"__EVENTVALIDATION")
Hmm, I manually changed the tab= string in the URL, to "tab=2" and
"tab=3" etc. and got the subsequent tables correctly, so it doesn't seem
to me that they are trying to hide the data.
.
- References:
- Extracting a table from a webpage
- From: googlinggoogler@xxxxxxxxxxx
- Extracting a table from a webpage
- Prev by Date: Re: use of DBI; I am getting multiple error messages mixed in with ?the correct output.
- Next by Date: Re: Marketing Software
- Previous by thread: Extracting a table from a webpage
- Next by thread: Re: Extracting a table from a webpage
- Index(es):