Re: Delayed WEB Page Response
- From: damians@xxxxxxxxxxxxx
- Date: Wed, 20 Feb 2008 09:01:06 -0800 (PST)
On Feb 7, 5:00 pm, Mark Clements <mark.clementsREMOVET...@xxxxxxxxxx>
wrote:
aage.gribs...@xxxxxxxxx wrote:
I wish to capture data from a Web page e.g.
"http://www.eppraisal.com/PropertyInfo.aspx?a=1215%20Jefferson
%20Ave&z=46201"
I am using the LWP modules.
The page responds in three steps and I have succeeded in capturing
only the first.
The page first paints up nicely with "Loading" text in the area of
interest.
After a delay the "Loading" text is replaced with "Calculating".
Shortly thereafter, sometimes apparently instantaniously, the data of
interest appears.
I have tried LWP:: UserAgent and LWP::Parallel::UserAgent and capture
only the initial response.
TimeOut parameters do not change the behavior.
The callback subroutine indicates the HTML comes in several chunks.
How can the other responses be captured?
The documentation mentions LPW::Parallel::UserAgent::Entry objects
and follow up requests.
Will this be of help?
I have found no documentation of this feature.
Is there any additional documentation or examples?
It's using javascript - which neither LWP nor WWW::Mechanize will
execute - to move between pages. You could try using
Win32::IE::Mechanize or Selenium, but both of these rely on controlling
a running browser.
Mark
There is an API to some of our data. What data elements are you
looking to pull?
Send me an email or to info (at) eppraisal.com. Scraping the front-end
is time consuming and prone to errors (when we push out updates).
Damian (from eppraisal.com)
.
- References:
- Delayed WEB Page Response
- From: aage . gribskov
- Re: Delayed WEB Page Response
- From: Mark Clements
- Delayed WEB Page Response
- Prev by Date: Re: Net::DHCP::Packet - problems recv'ing broadcasted DHCP Offer
- Next by Date: RFC: new module SQL::QueryQueue
- Previous by thread: Re: Delayed WEB Page Response
- Next by thread: Object oriented database
- Index(es):
Relevant Pages
|
|