Re: urlencode and $_GET
- From: Oli Filth <catch@xxxxxxxxxxxxxx>
- Date: Fri, 18 Nov 2005 02:40:26 GMT
M. Trausch said the following on 18/11/2005 01:49:
Oli Filth wrote:
They aren't represented the same interally at all. A literal hash in a URL delimits an HTML reference to a named anchor, whereas %23 does not, it's treated as part of the query string in the HTTP GET request; try this simple test to demonstrate this:
That's very much like saying the character # on the right side of a hex dump and the '23' on the left side of a hex dump aren't represented internally at all. It's just a character reference, either way. Just because one may receive a flag that the other doesn't in one instance or several instances does not mean that it will in *all* instances.
%23 is a character reference, yes, but not an *HTML* character reference/entity, it's merely a way of representing # in an HTTP GET string, and means nothing in the context of HTML.
The browser treats %23 as exactly that, the literal characters %, 2, 3. In the context of a clicked hyperlink, these exact characters are transmitted in the corresponding HTTP GET request string. e.g. the following link:
<A href="http://example.com/file.php?%23xyz">...</A>
will result in the following HTTP request:
GET /file.php?%23xyz HTTP/1.1 Host: example.com
At no point between the server delivering the original HTML to the browser and the server receiving the GET request has %23 been decoded.
On the other hand, the browser treats the literal # as a delimiter (as defined by HTML specs), and strips that (and everything after it) from the URL before the HTTP request is made. e.g. the following link:
<A href="http://example.com/file.php?#xyz">...</A>
will result in the following HTTP request:
GET /file.php? HTTP/1.1 Host: example.com
Entirely different behaviour, working at a different layer (HTML vs. HTTP), completely defined by the specs (W3C HTML specs, and RFC 1738).
If you had tried the demo code I posted earlier, you would see this in action.
Where is it defined as "unsafe", except in RFC 1738 where it states that it's unsafe to use # unless to delimit a named anchor reference?
Show me an example where it doesn't work...
The fact is that the published standard which addresses the issue states that it's unsafe.
No, it states that it's unsafe to use # in cases other than where you mean it to be a delimiter for an HTML anchor identifier.
In cases where you do not intend it as a delimiter, you should encode it with the alternative, %23, because this *is* safe (defined as such in RFC 1738), and when received by the agent processing the HTTP GET request (i.e. the server), it is translated into the originally intended character, i.e. #.
It is wise to be cautious and write defensively towards something you can refer, then away from it, even if it does work on 98% of the browsers. My point was that you cannot make a blanket assumption about something when it's already known that it's unsafe and the behavior of an action is undefined.
However, the behaviour *is* *completely* defined, so any agent (browser, server, or otherwise) that behaves differently is in explicit breach of the specs, i.e. a bug.
-- Oli .
- References:
- urlencode and $_GET
- From: meltedown
- Re: urlencode and $_GET
- From: Erwin Moller
- Re: urlencode and $_GET
- From: black francis
- Re: urlencode and $_GET
- From: meltedown
- Re: urlencode and $_GET
- From: meltedown
- Re: urlencode and $_GET
- From: M. Trausch
- Re: urlencode and $_GET
- From: Oli Filth
- Re: urlencode and $_GET
- From: M. Trausch
- Re: urlencode and $_GET
- From: Oli Filth
- Re: urlencode and $_GET
- From: M. Trausch
- Re: urlencode and $_GET
- From: Oli Filth
- Re: urlencode and $_GET
- From: M. Trausch
- urlencode and $_GET
- Prev by Date: Re: memory leak
- Next by Date: Re: Passing Variables
- Previous by thread: Re: urlencode and $_GET
- Next by thread: Re: urlencode and $_GET
- Index(es):
Relevant Pages
|