[Tutor] Why difference between printing string & typing its object reference at the prompt?

Thu Oct 11 11:21:39 CEST 2012

On Thu, Oct 11, 2012 at 5:04 AM, Dave Angel <d at davea.name> wrote:
>
> Actually, the upper limit for a decoded utf-8 character is at least 6
> bytes.  I think it's 6, but it's no less than 6.

Yes, but what would be the point? Unicode only has 17 planes, up to
code 0x10ffff. It's limited by UTF-16.

> 2) There are many more byte formats, most of them predating Unicode
> entirely.  Many of these are specific to a particular language or
> national environment, and contain just those extensions to ASCII that
> the particular language deems useful.  Python provides encoders and
> decoders to many of these as well.

I mentioned 3 common formats that can completely represent Unicode
since this thread is mostly about Python 3 strings and repr -- at
least it started that way.

> 3) There are many things read and written in byte format that have no
> relationship to characters.  The notion of using text formats for all
> data (eg. xml) is a fairly recent one.  Binary files are quite common,
> and many devices require binary transfers to work at all.  So byte
> strings are not necessarily strings at all.

Sure, other than encoded strings, there are also more obvious examples
of data represented as bytes -- at least I hope they're obvious --
such as multimedia audio/video/images, sensor data, spreadsheets, and
so on. In main memory these exist as data structures/objects (bytes,
but not generally in a form suitable for transmission or storage).
Before being saved to files or network streams, the data is
transformed to serialize and pack it as a byte stream (e.g. the struct
module, or pickle which defaults to a binary protocol in Python 3),
possibly compress it to a smaller size and add error correction (e.g.
the gzip module), and possibly encrypt it for security (e.g.
PyCrypto).