Re: i18n'ed Character Set in DBMS and tables

From: Mark Yudkin (myudkinATcompuserveDOTcom_at_nospam.org)
Date: 09/10/04


Date: Fri, 10 Sep 2004 08:06:05 +0200

You may speak 3 western European languages, but all three of these can be
encoded within a single "Western European" ASCII character set, and do not
have seriously conflicting sort orders. Mixing German and Hebrew would be
somewhat messier, although rather common - even within a single document. Of
course, bidirectionality adds yet another complexity that you haven't
considered.

We use the 2 character ISO country code (ISO 3166); that's what is used
generally by international financial reporting. We also use GESMES
(UN/EDIFACT) (www.unece.org, c.f.
http://www.unece.org/trade/untdid/d99b/trmd/gesmes_c.htm), but that's less
concerned with languages, and more with data exchange. Locales, as we use
them, are those from I18n (http://www.w3.org/International/). The "problem"
(e.g. in Java's model) is that just because a user lives in some country /
locale, he does not necessarily use the language(s) defined for these, hence
he is forced to lie about his locale in order to get the desired language.
Microsoft, BTW, worked this out, and fixed the problem in Windows 2000. My
IE6 browser is set up for "German (Switzerland) [de-ch]" by default, with
the browser language set to English. There is no "English (Switzerland)
[en-ch]". Except for a few sites which seem to believe that I have to be
presented with the language of "my locale", I have no problems (google uses
it as a default, but lets me override it, saving my configuration).

Fortunately, you don't work with me, so I have no need to explain that mixed
language documents prevent separate tables, even if the underlying design of
vertically partitioning information without maintaining the partitioning key
were not totally wrong. It will be your boss's problem to clean up the chaos
you leave behind.

Since this conversation is a waste of time, this is my last response.

Dr Mark Yudkin

"Albretch" <lbrtchx@hotmail.com> wrote in message
news:f8544ad2.0409091355.4ce16149@posting.google.com...
> Usually I stop paying attention to people when they start getting
> personal. However I think our talk has been constructive for the most
> part.
>
> > How many of your 3 languages do you speak, read and write fluently?
> AM: I would say the three of them. Spanish is my mother tongue; I
> studied in Germany graduating with a Master's in Math/Physics and have
> lived in the US for ten years.
>
> > How many of these use non-Latin characters?
> AM: Do you mean latin-1/ISO-8859-1? Spanish and German use a few.
>
> > Are you aware of how many languages there are in the world?
> AM: Pretty much, if 'a whole lot' would qualify as an answer to you
> :-)
>
> > Have you considered that some languages have multiple writing systems,
even going so far as to use different alphabets, in
> different locales?
> AM: Yes, I have.
>
> > What about users whose language and locale don't mix?
> AM: What do you f*ck&ng mean? Are you using the terms 'language' and
> 'locale' as a free speech kind of thing or as defined technical
> standards?
>
> The Java API did a fine job at functionally describing both terms
>
> http://java.sun.com/j2se/1.5.0/docs/api/java/util/Locale.html
>
> if you understand Java/OOP; the fact that there is no Locale
> constructor without a specified language would tell you something.
>
> The language argument is a valid ISO Language Code. These codes are
> the lower-case, two-letter codes as defined by ISO-639. You can find a
> full list of these codes at a number of sites, such as:
> http://www.loc.gov/standards/iso639-2/englangn.html
> The country argument is a valid ISO Country Code. These codes are the
> upper-case, two-letter codes as defined by ISO-3166. You can find a
> full list of these codes at a number of sites, such as:
>
http://www.iso.ch/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html
>
> > Here I am, posting in English, and living in the Swiss German locale. My
> > keyboard is a Swiss German one, my Windows is an English one.
> AM: . . . and your browser settings are?
>
>
> > Swiss German
> > don't even use the same alphabetic characters for writing as Germany
does
> > for "German German".
> AM: the differences are very minimal indeed. I have spoken to Swiss
> German people and we have understood each other 'einwandfrei'. I would
> even dare to say American and Brittish English differ more and people
> use both without paying much attention to the differences
>
> > Also, did you actually understand Codd's normalization? Why are you
> > confusing type and interpretation?
> AM: . . . because it affects the sort order (even down to a
> physical level) and how fast the table is sampled with select stats
> that include these columns in the 'order by' clause.
>
> Dr. Codd's Rule #1: All information in a relational database is
> represented explicitly at the logical level in exactly one way: by
> values in tables. And, the data in each field is assumed to be atomic;
> that is, the smallest bit of useful information -- a single value.
>
> Think about this phrase in his stat "the smallest bit of useful
> information" . . . and you will understand what I mean
>
> > There is absolutely no way I would use a separate table for each
(language,
> > locale) combination. And I'm speaking as somebody who develops software
that
> > supports 4 languages (simultaneously) for a living (the three main
national
> > languages of this country: German, French and Italian, plus English),
and
> > stores additional languages in the database (international financial and
> > economic data from OECD, BIS, World Bank, etc.).
>
> AM: Wow! If you pay me to and/or I had the time to do it, I would
> technically prove my point to you, but since it is you the one gaining
> something by understanding it and I think you are pretty much capable
> of showing it to yourself (if you 'want to' see it), I will leave it
> to you as 'homework'
>
> "Mark Yudkin" <myudkinATcompuserveDOTcom@nospam.org> wrote in message
news:<chosn9$ij9$1@ngspool-d02.news.aol.com>...
> > How many of your 3 languages do you speak, read and write fluently? How
many
> > of these use non-Latin characters? Are you aware of how many languages
there
> > are in the world? Have you considered that some languages have multiple
> > writing systems, even going so far as to use different alphabets, in
> > different locales? What about users whose language and locale don't mix?
> > Here I am, posting in English, and living in the Swiss German locale. My
> > keyboard is a Swiss German one, my Windows is an English one. Swiss
German
> > don't even use the same alphabetic characters for writing as Germany
does
> > for "German German".
> >
> > Also, did you actually understand Codd's normalization? Why are you
> > confusing type and interpretation?
> >
> > There is absolutely no way I would use a separate table for each
(language,
> > locale) combination. And I'm speaking as somebody who develops software
that
> > supports 4 languages (simultaneously) for a living (the three main
national
> > languages of this country: German, French and Italian, plus English),
and
> > stores additional languages in the database (international financial and
> > economic data from OECD, BIS, World Bank, etc.).
> >
> > You want to support all languages, locales and scripts. But you don't
appear
> > to have the faintest idea of the problems involved.



Relevant Pages

  • Re: Why dont we spell cat with a K?
    ... A Swiss German peasant girls is supposed ... main foreign language used in her country), ... and/or Italian (the other official languages of her country). ... English seems a very likely choice, especially for a young person, ...
    (sci.lang)
  • Re: local problem in developing GUI
    ... Does that mean I don't have to care about the language problems in the ... >> met some problems when developing GUI software for English Linux OS. ... On top of this gettext is commonly-used to translate program ... >output into the language specifed by the locale. ...
    (comp.os.linux.development.apps)
  • Re: Keymap definitions for VT / NEWCONS
    ... And in fact, keyboard layouts are often specific to a country, ... of keymap files named after the language code and others named ... precisely why I'm not a fan of using the locale name. ... generally use the country code, with some non-ISO 3166 2-letter short ...
    (freebsd-hackers)
  • Re: Keymap definitions for VT / NEWCONS
    ... And in fact, keyboard layouts are often specific to a country, ... of keymap files named after the language code and others named ... precisely why I'm not a fan of using the locale name. ... generally use the country code, with some non-ISO 3166 2-letter short ...
    (freebsd-stable)
  • RfD: Internationalisation
    ... 2007-06-26 Updated rationale section, LOCALE@, and minor wordsmithing ... text files that can be edited and converted to another language ... in a similar way to the ANS word C", but returns a string identifier ... We use the word locale to mean the mixture of country, language, ...
    (comp.lang.forth)