Re: Brian Kernighan, maybe I'm not worthy, maybe I'm scum



On Dec 30, 4:14 pm, Richard Heathfield <r...@xxxxxxxxxxxxxxx> wrote:
Malcolm McLean said:



"spinoza1111" <spinoza1...@xxxxxxxxx> wrote in message
You have told me, for example, that C is fully aware of international
strings. But when I try to re-use this code in .Net, it can't be
called directly from C Sharp using a String object, because its
interface appears in C Sharp as consisting of sbyte arrays. I fear
that there is no way of converting sbyte arrays to and from two-byte
wide character arrays without an extra loop...although there "might"
be a single Pentium instruction to do so.

The identity of char and byte, with hindsight, was a mistake.

I'm not convinced, Malcolm. What makes you think it's a mistake?

3000 Chinese characters minimum needed to read the People's Daily, for
starters.

However it has the good effect that it encourages the use of ASCII, which
is the one de facto universal standard for data representation.

No, it isn't. For one thing, EBCDIC is used to store a vast amount of data..

Yes, sadly enough. But there's a near isomorphism because both EBCDIC
and ASCII today use 8 bits (yes, I know about 7 bits, another error).


For another, how, precisely, do you represent (in the sense of
"symbolise") the Polish for "represent" (in the sense of "symbolise")
using ASCII? ASCII unaccountably lacks the c-acute character which is
necessary for this representation to be possible.

But unicode has it. And why are you shooting yourself in the foot,
Richard?


Non-English languages can be build on top of ASCII, as has happened
successfully with HTML.

In other words, ASCII is inadequate to the task. I agree. And non-English
languages could be built on top of any inadequate encoding; there's
nothing special about ASCII.

Anything can be built on top of anything else. Unfortunately, this
gives privilege and economic power to a state which made it crystal
clear that it would "go it alone" and act exclusively in self
interest, whether by invading Iraq or supporting the apartheid Israeli
state. The market world wide is decoupling itself from this
dependency.



Non-ASCII representations certainly exist, but
they risk being unreadable on some platforms,

So does ASCII.

So does anything. But you can't use pragmatic-negative argumentation
at will.



and in fact they all have
weaknesses which make them undesireable as replacements for ASCII.

But you have said yourself that ASCII needs to be replaced by something
that can represent non-English languages.

Bang bang we all shoot ourselves in the foot. Are you even aware that
modern strings on .Net use two bytes consistently?

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

.



Relevant Pages

  • Re: About Windows address space
    ... equally compare dword for dword (just being careful of strings not ... someone's "knee ache" in a medical examination while the patient is ... be faster than ASCII strings, anyway"...or, if you like, "ASCII ... ASCII method generally...think it's totally dumb that Windows uses it ...
    (alt.lang.asm)
  • Re: RfD: Escaped Strings
    ... The word S" 6.1.2165 is the primary word for generating strings. ... the S" string cannot contain the '"' character, ... No conflict arises with the XCHARs proposal. ... \b BS (backspace, ASCII 8) ...
    (comp.lang.forth)
  • Re: RfD - Escaped Strings (long)
    ... The word S" 6.1.2165 is the primary word for generating strings. ... the S" string cannot contain the '"' character, ... No conflict arises with the XCHARs proposal. ... \b BS (backspace, ASCII 8) ...
    (comp.lang.forth)
  • Re: General programming question
    ... C#. I've created my PET/ASCII converter which works fine for byte arrays, but I'm having trouble with strings because .Net doesn't support 8-bit ASCII, only 7-bit ASCII, so when I convert a string with high bit values the converter translates the character as a question mark instead of the actual byte for the character. ... Since strings are stored in Unicode in .Net I'm unable to simply take each char and cast it to a byte because it's liable to loose precision on non-English character sets. ... Encodes this String into a sequence of bytes using the named charset, storing the result into a new byte array. ...
    (comp.sys.cbm)
  • Re: Question about Text Extraction from postscript files
    ... an ascii character or a special character ... As suggested, I "translate" the strings to ascii, considering two ... this solution works with all reports that I have ...
    (comp.lang.postscript)