Re: Brian Kernighan, maybe I'm not worthy, maybe I'm scum



Malcolm McLean said:

"Richard Heathfield" <rjh@xxxxxxxxxxxxxxx> wrote in message
Malcolm McLean said:

The identity of char and byte, with hindsight, was a mistake.

I'm not convinced, Malcolm. What makes you think it's a mistake?

There's no logical connection between the number of bits used to
represent a human-readable character and the smallest addressible unit of
memory.

Thank you. I'm still not convinced, but your argument is not without merit.


Typically bytes are 8 bits and chars are ASCII, so sizeof(char)
equals 1 byte, but that's just a coincidence.

Well, ASCII is of course 7 bits, so it isn't even a coincidence. But it
seems to me that, when C was developed, ASCII and EBCDIC were the dominant
character sets, and 8 bits was sufficient to represent either of them. I
claim no inner knowledge about Dennis Ritchie's design decisions, but it
may be that he simply didn't think of "byte" as a particularly important
concept; he was more interested in characters and numbers than in the
nitty-gritty details of storage and representation. This is not an
unreasonable stance even today - one might even say /especially/ today.


However it has the good effect that it encourages the use of ASCII,
which is the one de facto universal standard for data representation.

No, it isn't. For one thing, EBCDIC is used to store a vast amount of
data.
I doubt that there are many such machines without a heavily used ASCII to
EBCDIC conversion utility.

Right. ASCII to EBCDIC conversion is commonly done (IME, at any rate). It
is less common to see EBCDIC to ASCII conversion - once it's on the
mainframe, it *stays* there. :-) With the rise of the PC over the last 30
years or so, one common use for it is as a smart dumb terminal for a
mainframe, so yes, EBCDIC-to-ASCII conversion happens there, but only for
display purposes (and a goodly number of "truly" dumb 3270 terminals are
still in use, for which no conversion is necessary TTBOMKAB).

However if I have one on my PC I don't know
where to find it. You can pretty much guarantee that an ASCII file will
be human-readable on your machine.

Well, it would be trivial to come up with counter-examples, but yes, I know
what you mean.

For another, how, precisely, do you represent (in the sense of
"symbolise") the Polish for "represent" (in the sense of "symbolise")
using ASCII? ASCII unaccountably lacks the c-acute character which is
necessary for this representation to be possible.

&c_acute;
or something similar.

Ah, I see what you mean. Thank you for explaining.

But you have said yourself that ASCII needs to be replaced by something
that can represent non-English languages.

Yes. If we were inventing the computer from scratch we wouldn't use
ASCII. Unfortunately the "cover every glyph" approach has the problem
that the fonts are then too difficult to implement, also that keyboards
typically don't have the characters. So ASCII has stuck.

This whole subject is one for which a very good solution is most unlikely
to gain any ground, because "good-enough" solutions have prevailed for too
long.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
.



Relevant Pages

  • Re: Java and CCSID problems
    ... Btw, the docs also mention the environment variable ... QIBM_PASE_DESCRIPTOR_STDIO which should be set to B to prevent ASCII to ... EBCDIC conversion. ... When doing a WRKENVVAR there is the follwoing environment variable and ...
    (comp.sys.ibm.as400.misc)
  • Re: what does "serialization" mean?
    ... Sorry eddie, but you're dead wrong there as usual. ... >>How about ASCII character 0xB0, ... > Totalitarians and Fascists are often self-appointed language police. ...
    (comp.programming)
  • Re: what does "serialization" mean?
    ... > attempt to present myself as an authority on any and every topic I have ... >> survived and EBCDIC did not because ASCII properly sequenced letters. ... > How about ASCII character 0xB0, ... >> must assert negative facts, for all he knows is there is no knowledge ...
    (comp.programming)
  • Re: Cohens paper on byte order
    ... I think you're using "ASCII" in a notional sense. ... a good reason to teach the *opposite* convention, ... Computers should be as easy to understand as is possible _without_ ... arithmetic on character strings ...
    (sci.crypt)
  • Re: Reading a file.
    ... your program will interpret them as ASCII. ... Bruce.Eitman AT EuroTech DOT com ... buffer is character values, then in memory ASCII values are displayed. ... DWORD d = GetLastError; ...
    (microsoft.public.windowsce.app.development)