Re: urllib interpretation of URL with ".."
- From: John Nagle <nagle@xxxxxxxxxxx>
- Date: Mon, 25 Jun 2007 09:42:31 -0700
Duncan Booth wrote:
"Martin v. Löwis" <martin@xxxxxxxxxxx> wrote:
Is "urllib" wrong?
Section 5.2 is also relevant here. In particular:
g) If the resulting buffer string still begins with one or more
complete path segments of "..", then the reference is
considered to be in error. Implementations may handle this
error by retaining these components in the resolved path (i.e.,
treating them as part of the final URI), by removing them from
the resolved path (i.e., discarding relative levels above the
root), or by avoiding traversal of the reference.
The common practice seems to be for client-side implementations to handle this using option 2 (removing them) and servers to use option 3 (avoiding traversal of the reference). urllib uses option 1 which is also correct but not as useful as it might be.
That's helpful. Thanks.
In Python, of course, "urlparse.urlparse", which is
the main function used to disassemble a URL, has no idea whether it's being
used by a client or a server, so it, reasonably enough, takes option 1.
(Yet another hassle in processing real-world HTML.)
John Nagle
.
- Follow-Ups:
- Re: urllib interpretation of URL with ".."
- From: sergio
- Re: urllib interpretation of URL with ".."
- From: John J. Lee
- Re: urllib interpretation of URL with ".."
- References:
- urllib interpretation of URL with ".."
- From: John Nagle
- Re: urllib interpretation of URL with ".."
- From: "Martin v. Löwis"
- Re: urllib interpretation of URL with ".."
- From: Duncan Booth
- urllib interpretation of URL with ".."
- Prev by Date: listing all property variables of a class instance
- Next by Date: RE: Help With Better Design
- Previous by thread: Re: urllib interpretation of URL with ".."
- Next by thread: Re: urllib interpretation of URL with ".."
- Index(es):
Relevant Pages
|