Re: Improving i18n support (Was: Re: Christmas



Larry W. Virden wrote:
The most surprising thing to me, as a developer, is that I'm expected
to use obscure binary/octal/hex codes when attempting to write unicode
strings . That seems so... archaic.

That's surely a restriction of whatever editor you're using, not of Tcl.

Message catalogs are encoded in UTF-8, so you've always been able to use
Unicode there (Well, since 8.1, when Tcl first became Unicode-aware).
On this point, I know whereof I speak: I've committed several message
catalogs for KHIM in the last few months, including one for Czech,
which not only uses non-ASCII characters, but non-Latin-1 characters.

While the [source] command defaults to platform-native encoding,
and [source -encoding] and '-encoding' on the command line are 8.5
features, you can also do:

set file [open $path r]
fconfigure $file -encoding utf-8;# or whatever encoding you like
eval [read $file]
close $file

which does the same thing as [source] but is encoding-aware.
Well, [info script] doesn't work, and there are a few other niggling
details, but everything can be worked around. So
Unicode is available in ordinary Tcl scripts as well.

I quite routinely put non-ASCII characters in my string literals
nowadays.
--
73 de ke9tv/2, Kevin
.



Relevant Pages

  • Re: Unicode Support
    ... > Not knowing much about UTF-8 (my Unicode knowledge extends as far as ... > literal strings of this form as long as the character code for quote ... > can never appear in a MBCS (multibyte character sequence). ... then XP Notepad directly understands UNICODE and you can ...
    (alt.lang.asm)
  • Re: Rubys not ready - an indepth essay
    ... Most people don't need full-on Unicode munging in ... Without Unicode support, a string operation in a non-English alphabet ... UTF-8 is backwards compatible with ASCII. ... Thus you can safely split any UTF-8 strings on ASCII ...
    (comp.lang.ruby)
  • Re: i18n hell
    ... table attributes to UTF-8 only garbage kept adding into the database. ... you using unicode strings or byte strings? ...
    (comp.lang.python)
  • Re: Tcl_GetByteArrayFromObj and Utf-8 strings
    ... Tcl treats text as strings of Unicode characters. ... uses UTF-8 encoding for compatibility, the Tcl language itself and the ...
    (comp.lang.tcl)
  • Re: How to check variables for uniqueness ?
    ... characters is the sequence SS. ... is simply capitalizing strings. ... The fact that case mapping in English /is/ simple is neither here not ... That is a fair criticism of the Unicode position. ...
    (comp.lang.java.programmer)