Re: Attention: European C/C++/C#/Java Programmers-Call for Input
- From: Paul Keinanen <keinanen@xxxxxx>
- Date: Thu, 29 Jan 2009 17:54:15 +0200
On Thu, 29 Jan 2009 09:28:09 -0600, "Paul K. McKneely"
<pkmckneely@xxxxxxxxxxxxx> wrote:
Did you miss the key point? *UNICODE*. They very specifically choose a
*standard* for their encodings, not something incompatible and
proprietary. In particular, it's very useful to be able to write comments
and strings in Unicode - many modern languages allow it. If you had
suggested using Unicode, or Latin-1, or listened to the idea when it was
suggested, then you'd have got far more support - it's the idea of have a
proprietary half-baked encoding that is incompatible with every other tool
that is "incredibly stupid".
My fault for phrasing my original question badly. I should
never have mentioned the words "character set". Forget that
there is an internal encoding method that is used in the compiler
tools for this new language whose codes will never be seen by its users.
The programming lanugage supports only a subset of the complete
UNICODE character set regarding the Western European
alphabetics. The language only recognizes a maximum of 254
alphanumerics (Basic Greek and Cyrillic are included) for variable
names etc. including the underscore which is regarded as alphabetic
but ordinally precedes all others. If Western European
programmers had to choose a subset of these for language
support, which ones would they be?
I still do not understand why you want to use some own internal
representation instead of e,g. UTF-8. For any language using a Latin
script for identifiers, the effective string length is 1.0x or rare
cases 1.1x times the length of the identifier. For Cyrillic or Greek,
the ratio is 2.0.
So the extra memory consumption e.g. in compiler symbol tables are
negligible.
Regarding linkers, UTF-8 global symbol names should not be a problem,
unless the object language uses the 8th bit for some kind of signaling
(such as end of string) or otherwise limits the valid bit
combinations.
Of course the UTF-8 encoding may increase the identifier length, but
at least for a linker that usually examines only a specific number of
bytes, such as 32, the only risk is that two identifiers are not
unique within 32 bytes i.e. 16 characters in Greek or Cyrillic or 10
graphs in some East-Asian script.
Paul
.
- Follow-Ups:
- Re: Attention: European C/C++/C#/Java Programmers-Call for Input
- From: Paul K. McKneely
- Re: Attention: European C/C++/C#/Java Programmers-Call for Input
- References:
- Re: Attention: European C/C++/C#/Java Programmers-Call for Input
- From: Stefan Reuther
- Re: Attention: European C/C++/C#/Java Programmers-Call for Input
- From: Boudewijn Dijkstra
- Re: Attention: European C/C++/C#/Java Programmers-Call for Input
- From: Frank Buss
- Re: Attention: European C/C++/C#/Java Programmers-Call for Input
- From: Falk Willberg
- Re: Attention: European C/C++/C#/Java Programmers-Call for Input
- From: Paul K. McKneely
- Re: Attention: European C/C++/C#/Java Programmers-Call for Input
- From: Falk Willberg
- Re: Attention: European C/C++/C#/Java Programmers-Call for Input
- From: Paul K. McKneely
- Re: Attention: European C/C++/C#/Java Programmers-Call for Input
- From: David Brown
- Re: Attention: European C/C++/C#/Java Programmers-Call for Input
- From: Paul K. McKneely
- Re: Attention: European C/C++/C#/Java Programmers-Call for Input
- From: David Brown
- Re: Attention: European C/C++/C#/Java Programmers-Call for Input
- From: Paul K. McKneely
- Re: Attention: European C/C++/C#/Java Programmers-Call for Input
- Prev by Date: Re: Attention: European C/C++/C#/Java Programmers-Call for Input
- Next by Date: Re: Embedded webserver with post method parsing
- Previous by thread: Re: Attention: European C/C++/C#/Java Programmers-Call for Input
- Next by thread: Re: Attention: European C/C++/C#/Java Programmers-Call for Input
- Index(es):
Relevant Pages
|