AI-285 - Comment from Unicode list

From: David Starner (dvdeug_at_email.ro)
Date: 02/15/04

  • Next message: Marin David Condic: "Re: No call for Ada (was Re: Announcing new scripting/prototyping language)"
    Date: Sun, 15 Feb 2004 00:25:11 GMT
    
    

    Markus Scherer writes (at
    <http://www.unicode.org/mail-arch/unicode-ml/y2004-m01/0508.html>

    ***
    D. Starner wrote:
    >> #12 UTF-16 for Processing
    >
    > This is incorrect in saying that Ada uses UTF-16. It supports UCS-2
    > only. The text of the standard says:
    >
    > The predefined type Wide_Character is a character type whose values
    > correspond to the 65536 code positions of the ISO 10646 Basic
    > Multilingual Plane (BMP). [...]
    >
    > which doesn't include surrogate code points. The next

    True, but not much different/worse than for Java, for example. Once you have 16-bit types and string
    literals, adding a few functions to deal with supplementary code points is not hard. We did this for
    Java in ICU4J.

    There is little difference for a language between supporting UCS-2 or UTF-16 because where functions
    do not handle supplementary code points, they usually also don't handle Unicode versions above 3.0 -
    so string case mappings etc. are the same.

    A language like that can be relatively easily upgraded to full UTF-16 handling by updating the
    character and string function implementations, and adding a few new APIs - that is what Java is
    doing. The upgrade is done naturally when the standard functions are extended to Unicode 3.1 or later.

    As such, whether the strings contain UCS-2 or UTF-16 depends less on the language definition and
    more on the functions that are used, and the version of the standard libraries.

    > version of Ada will have 32-bit characters to fully
    > support Unicode - the text of the proposal is here:
    >
    > <http://www.ada-auth.org/cgi-bin/cvsweb.cgi/AIs/AI-00285.TXT?rev=1.14>
    >
    > plus lengthy discussion on the issues.

    Thank you very much for the link.

    The proposal seems to be to continue to treat Wide strings as UCS-2, and to treat Wide_Wide strings
    (a new type) as UTF-32. This would give Ada a total of three different native string types on the
    language level. It would also mean that existing code, using 16-bit strings, would not benefit from
    an upgrade but would instead have to be rewritten for support of supplementary code points. This may
    in fact slow down such support.

    There will be a presentation of the choices for Java (including UTF-32) at IUC 25.

    Best regards,
    markus

    ***


  • Next message: Marin David Condic: "Re: No call for Ada (was Re: Announcing new scripting/prototyping language)"

    Relevant Pages

    • Re: Operator overloading in C
      ... All development of C as an independent language has ... making any changes or improvements to the standard ... The lack of a counted string data structure, ... Pointers can't be used for arg1 or arg2. ...
      (comp.std.c)
    • Re: Is C99 the final C? (some suggestions)
      ... because the ANSI standard obsoleted them, and everyone picked up the ANSI ... There are far more pressing problems in the language that one would like to ... But a string has variable length. ... > are multiplying two expressions of the widest type supported by your ...
      (comp.lang.c)
    • Re: How to make Forth interesting?
      ... Standard Forth doesn't give you all the tools to do that. ... thought here is to set up some new wordlists whose hash function is the ... John Passaniti says if you have a language that's ... it might be useful to have more string stuff in Forth. ...
      (comp.lang.forth)
    • Re: Why C is really a bad programming language
      ... competent in terms of a low standard, he realized that he had to write ... his own string handlers, and he did so. ... Why on EARTH would anyone EVER use a language for applications or even ... string handlers work with EBCDIC, but hasn't told me how he would test ...
      (comp.lang.c)
    • Re: Boost process and C
      ... typedef struct _string { ... standard library functions that use this representation, ... enthusiasm for adding it to the language. ... still in C89 and the few points that C99 brought in the sense of a ...
      (comp.lang.c)