Re: Unicode literals and byte string interpretation.
- From: Chris Angelico <rosuav@xxxxxxxxx>
- Date: Fri, 28 Oct 2011 14:38:39 +1100
On Fri, Oct 28, 2011 at 2:05 PM, Fletcher Johnson <flt.johnson@xxxxxxxxx> wrote:
If I create a new Unicode object u'\x82\xb1\x82\xea\x82\xcd' how does
this creation process interpret the bytes in the byte string? Does it
assume the string represents a utf-16 encoding, at utf-8 encoding,
etc...?
For reference the string is これは in the 'shift-jis' encoding.
Encodings define how characters are represented in bytes. I think
probably what you're looking for is a byte string with those hex
values in it, which you can then turn into a Unicode string:
u'\u3053\u308c\u306f'a=b'\x82\xb1\x82\xea\x82\xcd'
unicode(a,"shift-jis") # use 'str' instead of 'unicode' in Python 3
The u'....' notation is for Unicode strings, which are not encoded in
any way. The last line of the above is a valid way of entering that
string in your source code, identifying Unicode characters by their
codepoints.
ChrisA
.
- References:
- Unicode literals and byte string interpretation.
- From: Fletcher Johnson
- Unicode literals and byte string interpretation.
- Prev by Date: Re: Unicode literals and byte string interpretation.
- Next by Date: Re: Dynamically creating properties?
- Previous by thread: Re: Unicode literals and byte string interpretation.
- Next by thread: Re: Unicode literals and byte string interpretation.
- Index(es):
Relevant Pages
|