Re: web crawling program
- From: George Maicovschi <georgemaicovschi@xxxxxxxxx>
- Date: Mon, 31 Mar 2008 11:05:56 -0700 (PDT)
On Mar 31, 6:25 pm, Gary L. Burnore <gburn...@xxxxxxxxxxxxx> wrote:
On Mon, 31 Mar 2008 07:14:54 -0700 (PDT), raven <rvnsn...@xxxxxxxxx>
wrote:
Thank you Jerry and George for your quick responses. The 100 options
in the forms corresponds the cities and the form in the site doesn't
allow you to make a query without selecting the city first. Since I
have no idea for city information(actually i am searching it) a
specific query takes 50 average submit to find and i have over 100
queries. Be it because of the mistake of the remote site designer or
me being evil:( it is unbearable to proceed one by one by hand.
My bet is the site will figure out that you're botting them after
about the first 50 and shut you down. Ever thought of just asking
them for the data?
--
Well, in order to go about this you should do the following things:
1. Choose a user-agent to emulate (Microsoft Internet Explorer or
Firefox)
2. Choose a random request time so that you don't send requests all
the time.
Both this options are available in CURL and also WGET so you could use
any of them. And even might want to do it from more than one IP.
That's my opinion, if you need any more help on spidering the data
just drop me an email.
Regards,
George Maicovschi.
.
- References:
- web crawling program
- From: raven
- Re: web crawling program
- From: Jerry Stuckle
- Re: web crawling program
- From: George Maicovschi
- Re: web crawling program
- From: raven
- web crawling program
- Prev by Date: Re: zend studio trial version cd
- Next by Date: Re: PHP Multi-threading
- Previous by thread: Re: web crawling program
- Next by thread: Is Apache Needed w/IIS (Windows 2003) for PHP Install
- Index(es):
Relevant Pages
|