Re: i18n'ed Character Set in DBMS and tables

From: Albretch (lbrtchx_at_hotmail.com)
Date: 09/06/04

  • Next message: Brian: "Re: Realtime servlets? Newbie Question"
    Date: 5 Sep 2004 17:24:49 -0700
    
    

    "Mark Yudkin" <myudkinATcompuserveDOTcom@nospam.org> wrote in message news:<chfb5k$1bn$1@ngspool-d02.news.aol.com>...
    . . .
    > You don't. I18n is designed to permit you to SELECT ONE SPECIFIC language
    > and process that correctly. To store and retrieve "ALL" languages, you use
    > Unicode.
    . . .

     Well, I see 'issues' right there. I think your approach to it is
    wrong 'by design and implementation' and this is what I am trying to
    avoid.

     Let's say you can set the character set and collation all the way
    down to the column. Now, if you 'use unicode to store and retrieve
    "ALL" languages' (as you suggest) and since, naturally and per SQL
    ANSI's spex, you can only set a character set and a related collation
    on a column (and AFAIK (and I could see why it is not so) you can not
    specify/change collation on the fly as you run a 'select' query with
    an 'order by' clause)

     Unless in Unicode, which by the way I see as a good technical example
    of a waste by trying to keep extensively ASCII'ing all nat langs (this
    is the weakest/silliest 'standard' I know of), collation is not
    necessary since there is a 1=1 map between character set and collation
    orders for all langs which to me sounds really unnatural

     Say you have Korean names and Swahili ones in a table, do people 'use
    unicode to store and retrieve "ALL" languages and since' and then keep
    an extra columns specifying the character set, . . . and then 'SELECT
    ONE SPECIFIC language and process that correctly' a la':

     SELECT SrName, FName from NamesTable
    WHERE(CHAR_SET_Col='Korean_CHAR_SET') SORT BY SrName, FName;

    and/or

     SELECT SrName, FName from NamesTable
    WHERE(CHAR_SET_Col='Swahili_CHAR_SET') SORT BY SrName, FName;

     this would be -way- slower than having the two tables collation
    sensitive columns set to the correct char_set + collation pair,
    keeping an index on them and periodically physically sorting them.

     Or?


  • Next message: Brian: "Re: Realtime servlets? Newbie Question"

    Relevant Pages

    • Re: Finding out if a given character is in UpperCase, LowerCase or Numeric
      ... character set" being used is different in different locales. ... programming language uses Unicode, or offers locale-aware support. ...
      (microsoft.public.vb.syntax)
    • Re: Beyond ascii
      ... Only that the character set not be full Unicode. ... > in their own language even in the face of restrictions. ... programmers just knew these traps and avoided using them. ...
      (comp.lang.scheme)
    • Re: Unicode in menu and form caption
      ... > Dear Mike ... as a matter of fact vb 6 is really poor in unicode ... Maybe it's just a "language barrier" type of thing (spoken language, ... it's just a character set, albeit one that's large enough to support ...
      (microsoft.public.vb.winapi)
    • Re: Ascii to LCID or Code page function
      ... It is impossible to tell the character set (or language) from an amorphous ... and especially from a single character Lou. ... Unfortunately the VB Textbox is not unicode. ...
      (microsoft.public.vb.general.discussion)
    • =?windows-1252?Q?Encrypting_Unicode_=96_Using_ASCII_as_a_Surrogate_Al?= =?windows-1252?Q?pha
      ... characters of an exotic eastern language using an ASCII keyboard. ... It is true to say that any keyboard of any language can be simulated ... communicate in large volume with China or Japan using CJK from Unicode ... by the computer as an external file and enciphered by a stream cipher ...
      (sci.crypt)

    Loading