Re: utf8 silly question



Salut, Catalin

You can first convert your c string to unicode, and in the process
specify an encoding that understands non-ASCII characters (if you don't
specify an encoding, it will try to use your default, which is most
likely ASCII, and you'll get the error you mentioned.). In the
following example, I specified 'iso-8859-1' as the encoding.

Then you can utf8-encode the c string via the codecs module.

Here's a snippet of code (note the error when I don't specify a
non-default unicode encoding):

Python 2.4 (#1, Nov 30 2004, 16:42:53)
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> c = unicode(chr(169)+" some text")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa9 in position 0:
ordinal not in range(128)
>>> c = unicode(chr(169)+" some text", 'iso-8859-1')
>>> print c
© some text
>>> import codecs
>>> print codecs.encode(c, 'utf-8')
© some text

.



Relevant Pages

  • Re: eval and unicode
    ... encoding your terminal/file/whatnot is written in. ... you have a byte string that starts with u, then ", then something ... The first item in the sequence is \u5fb9 -- a unicode code point. ...
    (comp.lang.python)
  • Ruby, Unicode - ever?
    ... Why can't ruby use at least ICU libs? ... proper Unicode support, don't try to cheat me, that it's OK and enough, ... Ruby String class in current state is TOO MUCH OVERLOADED: ... encoding is senseless - this is plain bit stream. ...
    (comp.lang.ruby)
  • Re: Why asci-only symbols?
    ... >> Perhaps string equivalence in keys will be treated like numeric equivalence? ... I know typewill be and in itself contain no encoding information now, ... >and a Unicode string, the system default encoding ...
    (comp.lang.python)
  • Re: Attention: European C/C++/C#/Java Programmers-Call for Input
    ... and strings in Unicode - many modern languages allow it. ... proprietary half-baked encoding that is incompatible with every other tool ... tools for this new language whose codes will never be seen by its users. ... the effective string length is 1.0x or rare ...
    (comp.arch.embedded)
  • Re: Unicode drives me crazy...
    ... every string on some level). ... Python needs to know what encoding is used. ... The decode instruction converts s into a unicode string - where Python ...
    (comp.lang.python)