Re: urlencode vs rawurlencode
From: John Dunlop (usenet+2004_at_john.dunlop.name)
Date: 04/22/04
- Next message: none: "strange variables"
- Previous message: John: "how to request new features"
- In reply to: Joshua Beall: "urlencode vs rawurlencode"
- Next in thread: John Dunlop: "Re: urlencode vs rawurlencode"
- Reply: John Dunlop: "Re: urlencode vs rawurlencode"
- Reply: Adriaan: "Re: urlencode vs rawurlencode"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 22 Apr 2004 18:33:12 +0100
Joshua Beall wrote:
> I can see from the manual that the difference between urlencode and
> rawurlencode is that urlencode translates spaces to '+' characters, whereas
> rawurlencode translates it into it's hex code.
>
> My question is, is there any real world difference between these two
> functions?
I don't know.
> Or perhaps another way of asking the question: *why* are there two
> different functions?
A good question. I don't know the answer to that either.
A plus sign is reserved in the query component. A reserved character
may be used for its reserved purpose or, if it doesn't conflict with
the reserved purpose, as data.
Spaces encoded as plus signs is specific to form encoding. The
HTML4.01 specification describes the encoding process: "[i]f the
method is 'get' and the action is an HTTP URI, the user agent takes
the value of action, appends a `?' to it, then appends the form data
set, encoded using the 'application/x-www-form-urlencoded' content
type" (HTML4.01, sec. 17.13.3). So, here, spaces are encoded as plus
signs; elsewhere, spaces are encoded as "%20", as explained in
RFC2396, section 2.4.
Consider:
1. <http://domain.example/?baz=foo+bar>
2. <http://domain.example/?baz=foo%20bar>
3. <http://domain.example/?baz=foo%2Bbar>
All three are syntactically valid URIs. The first could be a URI
generated from an HTML form, where the action specified was
<http://domain.example/>, the method GET and the form data set
consisting of a control named "baz" with current value "foo bar". The
space in the current value is replaced with a plus sign.
Reading Björn Höhrmann's explanation of reserved characters in
"Re: Good/Bad - URI encoding in HTML editor",
http://lists.w3.org/Archives/Public/uri/2002May/0032.html
we see that numbers one and two are *not* equivalent.
Also related is Terje Bless' request for clarification
"Ambiguity of Allowed/Recommended URI Syntax and Escaping",
http://lists.w3.org/Archives/Public/uri/2002Nov/0014.html
> In what situation would you need one, and not be able to use the other?
That depends on the URI generator, I think.
The documentation for urlencode says "[t]his function is convenient
when encoding a string to be used in a query part of a URL" [1]. I
don't see any reason to favour it over rawurlencode, however, which
encodes as per section 2.4 of RFC2396 (modulo the fact it always
encodes certain unreserved characters [2]).
Refs.:
"Uniform Resource Identifiers (URI): Generic Syntax", 1998,
http://www.ietf.org/rfc/rfc2396.txt
"Uniform Resource Locators (URL)", 1994,
http://www.ietf.org/rfc/rfc1738.txt
[1] "PHP: urlencode - Manual",
http://www.php.net/manual/en/function.urlencode.php
[2] Section 2.3 of RFC2396 says:
| Unreserved characters can be escaped without changing the semantics
| of the URI, but this should not be done unless the URI is being used
| in a context that does not allow the unescaped character to appear.
-- Jock
- Next message: none: "strange variables"
- Previous message: John: "how to request new features"
- In reply to: Joshua Beall: "urlencode vs rawurlencode"
- Next in thread: John Dunlop: "Re: urlencode vs rawurlencode"
- Reply: John Dunlop: "Re: urlencode vs rawurlencode"
- Reply: Adriaan: "Re: urlencode vs rawurlencode"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|