Re: unicode by default



MRAB <python@xxxxxxxxxxxxxxxxxxx> writes:

You need to understand the difference between characters and bytes.

Yep. Those who don't need to join us in the third millennium, and the
resources pointed out in this thread are good to help that.

A string contains characters, a file contains bytes.

That's not true for Python 2.

I'd phrase that as:

* Text is a sequence of characters. Most inputs to the program,
including files, sockets, etc., contain a sequence of bytes.

* Always know whether you're dealing with text or with bytes. No object
can be both.

* In Python 2, ‘str’ is the type for a sequence of bytes. ‘unicode’ is
the type for text.

* In Python 3, ‘str’ is the type for text. ‘bytes’ is the type for a
sequence of bytes.

--
\ “I went to a garage sale. ‘How much for the garage?’ ‘It's not |
`\ for sale.’” —Steven Wright |
_o__) |
Ben Finney
.



Relevant Pages

  • Re: counting number of occurrences of every possible substring in multiple files
    ... > number of occurrences of n-length byte sequences across these files. ... > I'll be dealing with files up to about one megabyte in size. ... > not critical, and it does not matter, say, if a length-2 sequence is a ... or a more frequently occurring sequence. ...
    (comp.lang.perl.misc)
  • Tech : AFM resets and goes through boot sequence in middle of game
    ... In the middle of a game, it will go through the boot up sequence as if the power had just turned on. ... Still seems pretty intermittent at this point, but I'm worried it could get worse. ... Any ideas of what I'm dealing with here? ...
    (rec.games.pinball)
  • Re: counting number of occurrences of every possible substring in multiple files
    ... > Since I'll be dealing with binary files, ... > I'll be dealing with files up to about one megabyte in size. ... > not critical, and it does not matter, say, if a length-2 sequence is a ... or a more frequently occurring sequence. ...
    (comp.lang.perl.misc)
  • counting number of occurrences of every possible substring in multiple files
    ... number of occurrences of n-length byte sequences across these files. ... I'll be dealing with files up to about one megabyte in size. ... not critical, and it does not matter, say, if a length-2 sequence is a ... or a more frequently occurring sequence. ...
    (comp.lang.perl.misc)
  • Re: adjust
    ... Completing with a trailing ",", using the entire sequence instead of a subset, and dealing with your hex constants are left as an exercise to the reader. ...
    (comp.lang.python)