[Tutor] Why difference between printing string & typing its object reference at the prompt?

Dave Angel d at davea.name
Thu Oct 11 11:04:59 CEST 2012


On 10/11/2012 04:40 AM, eryksun wrote:
> On Wed, Oct 10, 2012 at 9:23 PM, boB Stepp <robertvstepp at gmail.com> wrote:
> .
>> What is the intended use of byte types?
>
> bytes objects are important for low-level data processing, such as
> file and socket I/O. The fundamental addressable value in a computer
> is a byte (at least for all common, modern computers). When you write
> a string to a file or socket, it has to be encoded as a sequence of
> bytes.
>
> <SNIP>
>
> Another common encoding is UTF-8. This maps each code to 1-4 bytes,

Actually, the upper limit for a decoded utf-8 character is at least 6
bytes.  I think it's 6, but it's no less than 6.

> without requiring a BOM (though the 3-byte BOM 0xefbbbf can be used
> when saving to a file). Since ASCII is so common, and since on many
> systems backward compatibility with ASCII is required, UTF-8 includes
> ASCII as a subset. In other words, codes below 128 are stored
> unmodified as a single byte. Non-ASCII codes are encoded as 2-4 bytes.
> See the UTF-8 Wikipedia article for the details:
>
> http://en.wikipedia.org/wiki/UTF-8#Description
This shows cases for up to 6 bytes.
> <snip>

Three other thing worth pointing out:  1) Python didn't define all these
byte formats.  These are standards which exist outside of the python
world, and Python lets you coexist with them.  If you want to create a
text file that can be seen properly by an editor that only supports
utf-8, you can't output UCS-4 and expect it to come up with anything but
gibberish.

2) There are many more byte formats, most of them predating Unicode
entirely.  Many of these are specific to a particular language or
national environment, and contain just those extensions to ASCII that
the particular language deems useful.  Python provides encoders and
decoders to many of these as well.

3) There are many things read and written in byte format that have no
relationship to characters.  The notion of using text formats for all
data (eg. xml) is a fairly recent one.  Binary files are quite common,
and many devices require binary transfers to work at all.  So byte
strings are not necessarily strings at all.

-- 

DaveA



More information about the Tutor mailing list