RE: i18n pot file

From: anabell (anabell_at_sh163a.sta.net.cn)
Date: 11/02/03


Date: Sun, 2 Nov 2003 14:37:03 +0800
To: "Python-List \(E-mail\)" <python-list@python.org>

I was unable to use charset utf-8 because I get this error message when I
try to run my localized application:

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: invalid
data

Finding an alternative, I tried writing a gb2312 codec, which is not
availabe initially with Python's package. I downloaded a gb2312 character
map and put it in the gb2312.py codec file. It worked well (Simplified
Chinese characters are displayed).

I wonder though how I can make utf-8 work if it does support any language.
I looked into my python23/lib/encodings/ and found there exist utf-8. I
edited my .po file to charset utf_8, and generated its .mo file. But when I
ran my localized application, python's gettext module can recognize the
charset 'utf8', but problem occurs when it starts decoding the .mo file.

I opened the utf_8.py codec file, and found no character map. I wonder if
it's using a wrong map?

> It depends. The best CHARSET to use is UTF-8
> but of course you have to enter UTF-8 data into
> the po file. You can write all languages in UTF-8.
> There are charsets specific to language (groups)
> which you may prefer. For e.g.
> Big5 for traditional chinese
> gb2312 for simplified chinese
> ISO-8859-15 for western European languages
>
> The ENCODING should be 8bit in all cases
>
> The handiest thing to do is to look at examples:
> http://www2.iro.umontreal.ca/~gnutra/registry.cgi?team=zh_CN
>
> Maybe you could even practice on my application :-)
> http://www2.iro.umontreal.ca/~gnutra/registry.cgi?domain=fslint
>
> Pádraig.

> > anabell wrote:
> > Hi, I'm trying to localize to Chinese language. In the pot
> file header,
> > there appears:
> >
> > "POT-Creation-Date: Thu Oct 16 17:07:14 2003\n"
> > "PO-Revision-Date: 2003-10-16 HO:MI+ZONE\n"
> > "Last-Translator: Anabell chan <achan@mail.design.com
> > <mailto:achan@mail.design.com>>\n"
> > "Language-Team: LANGUAGE <LL@li.org <mailto:LL@li.org>>\n"
> > "MIME-Version: 1.0\n"
> > "Content-Type: text/plain; charset=CHARSET\n"
> > "Content-Transfer-Encoding: ENCODING\n"
> > "Generated-By: pygettext.py 1.5\n"
> > What should i fill in the 'CHARSET' and 'ENCODING' ?
>
>



Relevant Pages

  • Re: different encoding handling between old ASP and ASP.Net
    ... globalization support and configuration between ASP and ASP.NET. ... charset to utf-8. ... decode as utf-8 encoding. ... In ASP.NET, we don't need to set these, since ASP.NET bydefault use utf-8 ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: Changing the default charset for composing messages
    ... > correct default for the localized version of Entourage you're using. ... > UTF-8 if your message contains characters from more than one character set. ... > will just choose the correct charset on the basis of the characters you've ...
    (microsoft.public.mac.office.entourage)
  • Re: How to convert the charset of texts in a Execl which has multi-language text and charset?
    ... How to convert the charset of texts in a Execl which has multi-language text and charset? ... So I use Iconv to convert the them into UTF-8. ... I managed to convert your Korean text from UTF-8 to EUC-KR, write it to a file and display it correctly in Firefox, once ...
    (comp.lang.ruby)
  • Re: Input Character Set Handling
    ... that compares a UTF-8 string to a string that a user has inputted into ... rather often if they have any clue at all about Unicode). ... Unicode is a *charset*: a set of characters where each character unit ...
    (comp.lang.javascript)
  • Re: Input Character Set Handling
    ... UTF-8" checked: ... The first URL leads to illegal HTTP transmission (no charset ... correction mechanics in browser. ...
    (comp.lang.javascript)