unicode by default

Ben Finney ben+python at benfinney.id.au
Thu May 12 00:07:08 EDT 2011


MRAB <python at mrabarnett.plus.com> writes:

> You need to understand the difference between characters and bytes.

Yep. Those who don't need to join us in the third millennium, and the
resources pointed out in this thread are good to help that.

> A string contains characters, a file contains bytes.

That's not true for Python 2.

I'd phrase that as:

* Text is a sequence of characters. Most inputs to the program,
  including files, sockets, etc., contain a sequence of bytes.

* Always know whether you're dealing with text or with bytes. No object
  can be both.

* In Python 2, ‘str’ is the type for a sequence of bytes. ‘unicode’ is
  the type for text.

* In Python 3, ‘str’ is the type for text. ‘bytes’ is the type for a
  sequence of bytes.

-- 
 \      “I went to a garage sale. ‘How much for the garage?’ ‘It's not |
  `\                                        for sale.’” —Steven Wright |
_o__)                                                                  |
Ben Finney



More information about the Python-list mailing list