Re: Any ideas how to read a url that's changed by the server?
- From: Rik <luiheidsgoeroe@xxxxxxxxxxx>
- Date: Tue, 21 Aug 2007 16:01:07 +0200
On Tue, 21 Aug 2007 15:44:12 +0200, TechieGrl <cschaller@xxxxxxxxx> wrote:
Requesting and discarding several pages before you enter the 'real' data
shouldn't be a problem like this.
If you have a cookie with a session-id, you probably don't need in the URL
(might be required though, I don't know which site).
Here's an example of a redirect - not the same site that I'm using,
but you can see what happens here.
When I type in http://my.opera.com, I am redirected to http://my.opera.com/community
Then when I click on a link, I go to a page that includes "community"
in the url - http://my.opera.com/community/blog/2007/08/17/member-of-the-week
I need to get from my.opera.com to the last url, but if the word
"community" was actually a changing session ID, then I would need to
check for that each time prior to getting to the page I really want,
member-of-the-week.
Does that make sense?
Could very well be. It all depends on how the implemented the session. If you enable the cookies in CURL on most site you'll just use the cookies, without having to check the url. If it enforces a GET session-id, you'll have to check that & continue to add it to subsequent reuqests (recheck for change, etc).
As said, you'll have to use curl_getinfo() to check for ending URL, possible use a curl_setopt() to get some headers which might be important.
Usefull functions here are also parse_url() & parse_str() for the returned (ending) url. And if it doesn't work, check with a 'normal' browser what redirects/headers get sent (Fiddler for MSIE & LiveHTTPHeaders for FF come to mind), copy that to curl, and remove again one by one untill you're left with the once that really matter. It's all about discovering (knowing/asking(would be fastest...)) what the actual inner workings of the site are.
Keep in mind that CURL works great as long as the site doesn't use javascript for some critical browsing/displaying/session functions. If it does, you're in for a very painstaking translation of the critical javascript code to the actual actions, which may or may not fail in future with the minimum amount of change in the setup of the site.
--
Rik Wasmus
.
- References:
- Any ideas how to read a url that's changed by the server?
- From: TechieGrl
- Re: Any ideas how to read a url that's changed by the server?
- From: Andy Hassall
- Re: Any ideas how to read a url that's changed by the server?
- From: TechieGrl
- Re: Any ideas how to read a url that's changed by the server?
- From: Rik
- Re: Any ideas how to read a url that's changed by the server?
- From: TechieGrl
- Re: Any ideas how to read a url that's changed by the server?
- From: Rik
- Re: Any ideas how to read a url that's changed by the server?
- From: TechieGrl
- Any ideas how to read a url that's changed by the server?
- Prev by Date: Re: Any ideas how to read a url that's changed by the server?
- Next by Date: Re: Edit an existing pdf document
- Previous by thread: Re: Any ideas how to read a url that's changed by the server?
- Next by thread: question mark in serialized object
- Index(es):
Relevant Pages
|
|