Pyhon 2.x or 3.x, which is faster?

BartC bc at freeuk.com
Wed Mar 9 09:03:42 EST 2016


On 09/03/2016 02:18, Steven D'Aprano wrote:
> On Wed, 9 Mar 2016 12:28 pm, BartC wrote:
>
>> (Which wasn't as painful as I'd expected. However the next project I
>> have in mind is 20K lines rather than 0.7K. For that I'm looking at some
>> mechanical translation I think. And probably some library to wrap around
>> Python's i/o.)
>
> You almost certainly don't need another wrapper around Python's I/O, making
> it slower still. You need to understand what Python's I/O is doing.

Well, the original project will be using its file i/o library. So it'll 
use the same interface that will be reimplemented on top of Python i/o.

And input operations mainly consist of grabbing an entire file at once. 
Output is a little more mixed.

> If you open a file in binary mode, Python will give you a stream of bytes
> (ordinal values 0 through 255 inclusive). Python won't modify or change
> those bytes in any way. Whatever it reads from disk, it will give to you.
>
> If you open a file in text mode, Python 3 will give you a stream of Unicode
> code points (ordinal values 0 through 0x10FFFF). Earlier versions of Python
> 3 may behave somewhat strangely with so-called "astral characters": I
> recommend that you avoid anything below version 3.3. Unless you are
> including (e.g.) Chinese or ancient Phoenician in your text file, you
> probably won't care.

I've just tried a UTF-8 file and getting some odd results. With a file 
containing [three euro symbols]:

€€€

(including a 3-byte utf-8 marker at the start), and opened in text mode, 
Python 3 gives me this series of bytes (ie. the ord() of each character):

239
187
191
226
8218
172
226
8218
172
226
8218
172

And prints the resulting string as: €€€. Although this latter 
might depend on my console's code page setting. Changing it to UTF-8 
however (CHCP 65001 in Windows) gives me this error when I run the 
program again:

----------
Fatal Python error: Py_Initialize: can't initialize sys standard streams
LookupError: unknown encoding: cp65001

This application has requested the Runtime to terminate it in an unusual 
way.
Please contact the application's support team for more information.
----------

(That was with 3.1; 3.4 gives the same set of characters as above, and 
shows the string differently, but still wrong. While PyPy 3.2.4 gives a 
different set of byte values, all 0..255, and a different string again, 
although it now contains some actual € characters.

So I think I'll skip Unicode handling to start off with! (I've already 
had plenty of fun and games with it in the past.)

-- 
Bartc






More information about the Python-list mailing list