Re: Bugs in http
- From: "Donal K. Fellows" <donal.k.fellows@xxxxxxxxxxxxxxxx>
- Date: Thu, 12 Nov 2009 16:01:23 -0800 (PST)
On 12 Nov, 17:06, "tom.rmadilo" <tom.rmad...@xxxxxxxxx> wrote:
If this were true, then server applications would do the translation
to the network eol sequence <CR><LF>. But this doesn't happen, the
document is transfered as binary data, because it is. The Content-
Length header is for bytes, and the body specifically does not end
with an additional <CR><LF>.
You're getting confused here. Servers shovel all sorts of crap down
the pipe. They ship the bytes as they are because that's fastest.
There are limitless complications to performing eol translations. For
instance, what if you are downloading source code, which is text/
plain? With windows you also have code pages, wide chars, etc. which
the server may not capture and include in the content-type.
Actually servers are supposed to capture all that and describe it in
the content type (through the charset parameter). That some miserably
fail to do this does not change the fact that they *should* do it. End-
of-line translations (and charset translations too) are meaningful for
all the text/* content types, though they may be times when you don't
want to perform them. (They're not meaningful at all for the other
major classes of content type, like image/* or application/*, when
only binary transfer even approaches sanity.)
But plain experience with how mainstream browsers work should
illustrate the situation: download a tcl source file and try to edit
it with notepad. For instance, Tcl's 2007 changelog with your
initials, download with mozilla, save as a text file and open with
notepad. So even saving to disk as a text document does not preform
eol (you get the expected garbage looking text).
This doesn't change the fact that you're wrong. You're using a browser
in a particular mode ("save a copy", which doesn't correspond to the
one I'm thinking of EOL translation being particularly useful in) and
you're also claiming that the only possible interpretation of a text
file is the one that it was originally created in. That's just not
true. In particular, it would mean that changing the encoding would be
prohibited, and that therefore every application that might want to
process that file must play "guess the encoding and translation" with
it, despite the (hopefully, but not necessarily, correct) metadata
being stripped from it at that point. In fact, it's because of
troubles like these that certain high-quality text processing
algorithms (e.g., XML Signature) use a canonicalization step. (OK, for
XML there are other things that need doing too, but encoding handling
is indeed part of it.)
The fundamental truth is that for text data, equivalence is not really
defined at the byte level. Instead, it's at the character level after
end-of-line handling. A lot of existing code gets this wrong; a lot of
people still don't understand the differences between bytes and
characters.
However, I'm arguing something less: the data is binary/opaque during
transfer. If the application can then figure out how to do a
transform, great, but it isn't part of http and the http part of the
application should preserve the exact data it received.
I disagree. HTTP is not just a binary download protocol.
Donal.
.
- Follow-Ups:
- Re: Bugs in http
- From: tom.rmadilo
- Re: Bugs in http
- References:
- Bugs in http
- From: drscrypt
- Re: Bugs in http
- From: tom.rmadilo
- Re: Bugs in http
- From: drscrypt
- Re: Bugs in http
- From: Gerald W. Lester
- Re: Bugs in http
- From: drscrypt
- Re: Bugs in http
- From: Gerald W. Lester
- Re: Bugs in http
- From: tom.rmadilo
- Re: Bugs in http
- From: drscrypt
- Re: Bugs in http
- From: tom.rmadilo
- Re: Bugs in http
- From: tom.rmadilo
- Re: Bugs in http
- From: tom.rmadilo
- Re: Bugs in http
- From: Alexandre Ferrieux
- Re: Bugs in http
- From: tom.rmadilo
- Re: Bugs in http
- From: Donal K. Fellows
- Re: Bugs in http
- From: tom.rmadilo
- Bugs in http
- Prev by Date: Re: socket limitations question
- Next by Date: Re: Google's Go language and "goroutines"
- Previous by thread: Re: Bugs in http
- Next by thread: Re: Bugs in http
- Index(es):
Relevant Pages
|