Pyhon 2.x or 3.x, which is faster?

Steven D'Aprano steve at pearwood.info
Tue Mar 8 18:28:40 EST 2016


On Tue, 8 Mar 2016 10:53 pm, BartC wrote:

> On 08/03/2016 02:12, Steven D'Aprano wrote:
>> On Tue, 8 Mar 2016 09:39 am, BartC wrote:
> 
>>> I'm using it because this kind of file reading in Python is a mess. If I
>>> do a read, will I get a string, a byte sequence object, a byte-array, or
>>> array-array, or what?
>>
>> Calling it "a mess" is an exaggeration. There is a change between Python
>> 2 and 3:
>>
>> - in Python 2, reading from a file gives you bytes, that is, the
>> so-called "str" type, not unicode;
>>
>> - in Python 3, reading from a file in binary mode gives you bytes, that
>> is, the "bytes" type; reading in text mode gives you a string, the "str"
>> type.
>>
>> How is this a mess?
> 
>                Python 2     Python 3
> 
> Text mode     'str'          'str'
> Binary mode   'bytes'        'str'

That table is incorrect. In Python 2, bytes is just an alias for str:

[steve at ando ~]$ python2.7 -c "print bytes is str"
True


and reading from a file returns a result depending on whether it is opened
in text or binary mode:

[steve at ando ~]$ echo foo > /tmp/junk
[steve at ando ~]$ python2.7 -c "print type(open('/tmp/junk', 'r').read())"
<type 'str'>
[steve at ando ~]$ python2.7 -c "print type(open('/tmp/junk', 'rb').read())"
<type 'str'>


In Python 3, you get bytes in binary mode, and Unicode strings (known
as "str") in text mode:

[steve at ando ~]$ python3 -c "print(type(open('/tmp/junk', 'r').read()))"
<class 'str'>
[steve at ando ~]$ python3 -c "print(type(open('/tmp/junk', 'rb').read()))"
<class 'bytes'>


Here it is in a table form:

--------  --------  --------
Mode      Python 2  Python 3
--------  --------  --------
Text      bytes     text
Binary    bytes     bytes
--------  --------  --------



And here is a table summarising the name changes around text and byte
strings:

------------  ---------  --------
Type          Python 2   Python 3
------------  ---------  --------
Byte strings  str/bytes  bytes  
Text strings  unicode    str
------------  ---------  --------

Remember that when Python first named its string type, Unicode didn't even
exist and there was no concept of strings being anything but a pseudo-ASCII
byte string.


> For certain file formats, text mode can't be used because byte values
> such as 10 could be expanded to the two bytes 13,10 on some systems.

If it is a *text* file, then it shouldn't matter whether your lines end with
\r, \n or \r\n. Indeed, in text mode, Python will automatically concert all
of these to \n on reading. (For writing, it will write what you tell it to
write.)

Python will never expand \n to \r\n. But it may translate \r\n to \n.


> So binary needs to be used. But Py2 and Py3 return different results;
> and indexing a bytes object gives you an int, a str object gives you a
> str.

It is true, and unfortunate, that indexing byte objects give ints instead of
a byte string of length 1:

[steve at ando ~]$ python3 -c "print(b'Aardvark'[0])"
65


In hindsight, it was a mistake, but as we are now in Python 3.5, breaking
backwards compatibility in the 3.x series means we are stuck with it
without some pain. Fortunately is it easy enough to work around: take a
one-element slice.

[steve at ando ~]$ python3 -c "print(b'Aardvark'[0:1])"
b'A'



> If you pass a file-handle to a function, that function can't just do a
> read from that handle without considering whether the file might be in
> text or binary mode, or whether it's running under Py2 or Py3.

Why not? file.read() works exactly the same in all four cases.

It's hard to argue against your statement when I don't understand what
problem you think you have. A concrete example of this "mess" would help.    



> And I think someone pointed out the difference between 'bytes',
> 'bytearray' and 'array.array', but I can't find that post at the minute.

Er, yes? Of course there is a differences between those three types. There's
also a difference between list, set and dict. What's your point?

 

-- 
Steven




More information about the Python-list mailing list