Re: Unicode problems, yet again



Jp Calderone wrote:

You don't have a string fetched from a database, in iso-8859-2, alas. That is the root of the problem you're having. What you have is a unicode string.

Yes, you're right :) I actually did have iso-8859-2 data, but, as I found out late last night, the data got converted to unicode along the way.


Thanks to all who replied so quickly :)

(Does anyone else feel that python's unicode handling is, well... suboptimal at least?)

Hmm. Not really. The only problem I've found with it is misguided attempt to "do the right thing" by implicitly encoding unicode strings, and this isn't so much of a problem once you figure things out, because you can always do things explicitly and avoid invoking the implicit behavior.

I'm learning that, the hard way :)

One thing that I always wanted to do (but probably can't be done?) is to set the default/implicit encoding to the one I'm using... I often have to deal with 8-bit encodings and rarely with unicode. Can it be done per-program?

.



Relevant Pages

  • Re: How do I display unicode-paths?
    ... to be most useful to programmers/end users? ... the terminal uses an encoding *different* from the user's ... Just don't convert the Unicode string into a byte string, ... The problem is that the windows console was using MS CP850, ...
    (comp.lang.python)
  • RE: Setting stdout encoding
    ... check for a unicode string to do ... Here's an output stream encoder I have used. ... so I'd welcome any feedback on it, but it does work for encoding output ... print>> out, nihongo ...
    (comp.lang.python)
  • Re: Linguistically correct Python text rendering
    ... The encoding issue is peripheral to my point; ... start with an Arabic string like "abc" I can get out an Arabic string ... correctly render *any* Unicode string, not just the subsets requiring no ...
    (comp.lang.python)
  • Re: Convert DOS Cyrillic text to Unicode
    ... > You would use Encoding.GetEncoding to get the DOS Cyrillic Encoding ... > Encoding.GetString to convert to a Unicode String. ... > Dim bytesAs Byte ...
    (microsoft.public.dotnet.languages.vb)
  • Re: byte count unicode string
    ... in a "UTF-8 encoded Python string object", ... A Python Unicode string is an abstract sequence of ... UTF-8 is a character encoding; ...
    (comp.lang.python)