Re: unicode by default
- From: harrismh777 <harrismh777@xxxxxxxxxxx>
- Date: Wed, 11 May 2011 20:22:50 -0500
John Machin wrote:
(1) You cannot work without using bytes sequences. Files are byte
sequences. Web communication is in bytes. You need to (know / assume / be
able to extract / guess) the input encoding. You need to encode your
output using an encoding that is expected by the consumer (or use an
output method that will do it for you).
(2) You don't need to use bytes to specify a Unicode code point. Just use
an escape sequence e.g. "\u0404" is a Cyrillic character.
Thanks John. In reverse order, I understand point (2). I'm less clear on point (1).
If I generate a string of characters that I presume to be ascii/utf-8 (no \u0404 type characters) and write them to a file (stdout) how does default encoding affect that file.by default..? I'm not seeing that there is anything unusual going on... If I open the file with vi? If I open the file with gedit? emacs?
Another question... in mail I'm receiving many small blocks that look like sprites with four small hex codes, scattered about the mail... mostly punctuation, maybe? ... guessing, are these unicode code points, and if so what is the best way to 'guess' the encoding? ... is it coded in the stream somewhere...protocol?
- Prev by Date: Re: checking if a list is empty
- Next by Date: Re: unicode by default
- Previous by thread: Re: unicode by default
- Next by thread: Re: unicode by default