Re: urllib interpretation of URL with ".."



"Martin v. Löwis" <martin@xxxxxxxxxxx> wrote:

Is "urllib" wrong?

I can't see how. HTTP 1.1 says that the parameter to the GET
request should be an abs_path; RFC 2396 says that
/../acatalog/shop.html is indeed an abs_path, as .. is a valid
segment. That RFC also has a section on relative identifiers
and normalization; it defines what .. means *in a relative path*.

Section 4 is explicit about .. in absolute URIs:
# The syntax for relative URI is a shortened form of that for absolute
# URI, where some prefix of the URI is missing and certain path
# components ("." and "..") have a special meaning when, and only when,
# interpreting a relative path.

Notice the "and only when": the browsers who modify above
URL before sending it seem to be in clear violation of
RFC 2396.

Section 5.2 is also relevant here. In particular:

g) If the resulting buffer string still begins with one or more
complete path segments of "..", then the reference is
considered to be in error. Implementations may handle this
error by retaining these components in the resolved path (i.e.,
treating them as part of the final URI), by removing them from
the resolved path (i.e., discarding relative levels above the
root), or by avoiding traversal of the reference.

The common practice seems to be for client-side implementations to handle
this using option 2 (removing them) and servers to use option 3 (avoiding
traversal of the reference). urllib uses option 1 which is also correct but
not as useful as it might be.

.



Relevant Pages

  • Re: empty form actions
    ... that is how empty URI references are defined. ... Simple evidence can be found in the reference resolution examples (5.4 in RFC 3986, ... To force a reload using only a link, one would have to add the query string, or use client-side scripting. ...
    (comp.infosystems.www.authoring.html)
  • Re: [Full-Disclosure] Microsoft Faces Angry IE Users Questions
    ... it is part of the _general_ URI scheme. ... if that is _ALL_ you recall from that RFC you are out of your ... general URI form (with _or without_ the "userid" feature), ... say anything meaningful about HTTP URIs you have to find if the HTTP ...
    (Full-Disclosure)
  • Re: urllib interpretation of URL with ".."
    ... Browsers immediately turn this into ... Is "urllib" wrong? ... That RFC also has a section on relative identifiers ... # The syntax for relative URI is a shortened form of that for absolute ...
    (comp.lang.python)
  • Re: Ahhh.. URL wants to get encoded. Does Java wanna?
    ... the way I read RFC 2396 is that reserved chars: ... Perhaps Patricia could read the RFC ... query, in which case it should be present as an actual & character. ... example URI Wayne gave uses ampersands as query ...
    (comp.lang.java.programmer)
  • Re: Relative URI parsing
    ... it's implementing an earlier version of the RFC for URI than I was ... The context of this is developing a library for Topic Maps (which uses URI's ... The examples I posted are the normative tests from the RFC I mentioned; ... > Welcome to MSDN newsgroup! ...
    (microsoft.public.dotnet.framework)