Re: Unicode literals and byte string interpretation.



On Oct 27, 2011, at 11:05 PM, Fletcher Johnson wrote:

If I create a new Unicode object u'\x82\xb1\x82\xea\x82\xcd' how does
this creation process interpret the bytes in the byte string? Does it
assume the string represents a utf-16 encoding, at utf-8 encoding,
etc...?

For reference the string is これは in the 'shift-jis' encoding.

Try it and see! One test case is worth a thousand words. And Python has an interactive interpreter. :-)


- Dave.



Relevant Pages

  • Unicode literals and byte string interpretation.
    ... this creation process interpret the bytes in the byte string? ... assume the string represents a utf-16 encoding, at utf-8 encoding, ...
    (comp.lang.python)
  • Re: Oh look, another language (ceylon)
    ... Each character in the string is a 32-bit Unicode ... The internal UTF-16 encoding is hidden from clients. ... string is a Category of its Characters, ...
    (comp.lang.python)
  • Converting to UCS-2 or UTF-16 for use by a C extension
    ... to convert a Ruby input string into UCS-2 or possibly UTF-16 encoding. ... encoded internally as UTF-8... ...
    (comp.lang.ruby)
  • Re: UTF-8 encoding in AJAX web application.
    ... and if you print the string it would be be printed incorrectly because ... you would be assuming a UTF-16 encoding when the encoding is in fact UTF-8. ... Encodings are only involved when converting text data to binary data ...
    (microsoft.public.dotnet.languages.csharp)