Re: Proxy and LWP::UserAgent



Mike wrote:
I have put together a pretty good working webcrawler using ActivePerl for windows. I am trying to get it to access a proxy from a list of PUBLIC proxies stored in a text file. I am seem to have trouble in this area. I am using a standard agent to access and I am setting up the proxy in the standard way using a public proxy from the text file.
the way I am reading the description of a proxy setting in LWP::UserAgent is: (bare with me on this logic) I am telling the server I am trying to access the web page to send the data to the proxy address I have set up in UserAgent . The problem is that I have not notified or accessed the public proxy where to tell it where to send the data. In other words, I have not before hand notified the proxy server to send the data to my IP address and I believe this is where I am having trouble.

Unless I misunderstand, your description of HTTP proxy operation is incorrect.


Let's say you wish to retrieve a web page at http://www.example.com/support/manual.html

Your script actually makes a TCP connection to www.example.com (on TCP port 80) and sends a request:
"GET /support/manual.html"


If you have a proxy at http://proxy.myco.org:3128 then what happens is that your script makes a TCP connection to proxy.myco.org (on TCP port 3128) and sends a request
"GET http://www.example.com/support/manual.html";


Since HTTP is stateless there's no question of "before hand notified the proxy server".

In very simplified form, ignoring caching, what happens is ...

script opens TCP connection 1 to proxy
script sends GET request to proxy
    Proxy opens connection 2 to webserver
    Proxy sends modified GET request to webserver
    Webserver sends response to Proxy
    Connection 2 is closed
Proxy sends response to script
Connection 1 is closed


I was looking for any information or any help in the use of proxies and LWP::UserAgent.



You don't have to change your code, the same code can work with or without proxies. If you want to use a proxy you can set environment variable HTTP_proxy=http://proxy.myco.org:3128.


Alternatively you can add a statement to your script that tells LWP::Useragent to use a proxy. I wouldn't try to deal with it all at the low level described above.

This is described in the LWP::UserAgent documentation. Go to www.google.com, type LWP::UserAgent and click "I'm feeling Lucky".

If your code isn't working, post it (after reading the posting guidelines at http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.text)
.




Relevant Pages

  • Re: Need help with creating a web forwarding site.
    ... Yep, I was thinking about the full HTTP response (including headers, HTML code ... In this script, yes - but thinking about it again, I guess you would need to ... What you're writing is more or less a proxy: ... Let's say the second request looks like this (just speculating - I'm too lazy ...
    (perl.beginners)
  • Re: BSDstats v3.0 - The Security Rewrite
    ... Some sites require the use of a proxy for HTTP access. ... The bsdstats script could easily pick up that entry and set ... a management machine, and that management machine only has ...
    (freebsd-questions)
  • Re: BSDstats v3.0 - The Security Rewrite
    ... Seaman to come up with a more "security sensitive" version of BSDstats ... ... Some sites require the use of a proxy for HTTP access. ... The bsdstats script could easily pick up that entry and set ... a management machine, and that management machine only has ...
    (freebsd-questions)
  • Re: x.500 Addresses ?
    ... I wrote a script a few ... Function StampMembers (strGroupADsPath, dicSeenGroupMember) ... Currently we have Exchange Server 2003 SP1 servers in an AD forrest. ... Would a X.500 proxy address help, if so how do I do that? ...
    (microsoft.public.exchange.admin)
  • Solution for XML-RPC over a proxy
    ... I couldn't get my xml-rpc script to work via a corporate proxy. ... def request: ...
    (comp.lang.python)