Re: Proxy and LWP::UserAgent
- From: RedGrittyBrick <RedGrittyBrick@xxxxxxxxxxxxx>
- Date: Thu, 30 Jun 2005 18:03:18 +0000 (UTC)
Mike wrote:
I have put together a pretty good working webcrawler using ActivePerl for windows. I am trying to get it to access a proxy from a list of PUBLIC proxies stored in a text file. I am seem to have trouble in this area. I am using a standard agent to access and I am setting up the proxy in the standard way using a public proxy from the text file.
the way I am reading the description of a proxy setting in LWP::UserAgent is: (bare with me on this logic) I am telling the server I am trying to access the web page to send the data to the proxy address I have set up in UserAgent . The problem is that I have not notified or accessed the public proxy where to tell it where to send the data. In other words, I have not before hand notified the proxy server to send the data to my IP address and I believe this is where I am having trouble.
Unless I misunderstand, your description of HTTP proxy operation is incorrect.
Let's say you wish to retrieve a web page at http://www.example.com/support/manual.html
Your script actually makes a TCP connection to www.example.com (on TCP port 80) and sends a request:
"GET /support/manual.html"
If you have a proxy at http://proxy.myco.org:3128 then what happens is that your script makes a TCP connection to proxy.myco.org (on TCP port 3128) and sends a request
"GET http://www.example.com/support/manual.html"
Since HTTP is stateless there's no question of "before hand notified the proxy server".
In very simplified form, ignoring caching, what happens is ...
script opens TCP connection 1 to proxy
script sends GET request to proxy
Proxy opens connection 2 to webserver
Proxy sends modified GET request to webserver
Webserver sends response to Proxy
Connection 2 is closed
Proxy sends response to script
Connection 1 is closed
I was looking for any information or any help in the use of proxies and LWP::UserAgent.
You don't have to change your code, the same code can work with or without proxies. If you want to use a proxy you can set environment variable HTTP_proxy=http://proxy.myco.org:3128.
Alternatively you can add a statement to your script that tells LWP::Useragent to use a proxy. I wouldn't try to deal with it all at the low level described above.
This is described in the LWP::UserAgent documentation. Go to www.google.com, type LWP::UserAgent and click "I'm feeling Lucky".
If your code isn't working, post it (after reading the posting guidelines at http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.text)
.
- References:
- Proxy and LWP::UserAgent
- From: Mike
- Proxy and LWP::UserAgent
- Prev by Date: Global variables from a module, while using strict
- Next by Date: Re: Global variables from a module, while using strict
- Previous by thread: Proxy and LWP::UserAgent
- Next by thread: simple upload
- Index(es):
Relevant Pages
|