Pyhon 2.x or 3.x, which is faster?

Terry Reedy tjreedy at udel.edu
Wed Mar 9 10:42:53 EST 2016


On 3/9/2016 9:03 AM, BartC wrote:

> I've just tried a UTF-8 file and getting some odd results. With a file
> containing [three euro symbols]:
>
> €€€
>
> (including a 3-byte utf-8 marker at the start), and opened in text mode,
> Python 3 gives me this series of bytes (ie. the ord() of each character):
>
> 239
> 187
> 191
> 226
> 8218
> 172
> 226
> 8218
> 172
> 226
> 8218
> 172
>
> And prints the resulting string as: €€€. Although this latter
> might depend on my console's code page setting.

It definitely does.

> Changing it to UTF-8 however (CHCP 65001 in Windows)

CP65001 is MS's ugly pretense of unicode compatibility.  It has been 
known to be buggy for over a decade, though some people claim to have 
gotten some use of it.

 > gives me this error when I run the  program again:
>
> ----------
> Fatal Python error: Py_Initialize: can't initialize sys standard streams
> LookupError: unknown encoding: cp65001
>
> This application has requested the Runtime to terminate it in an unusual
> way.
> Please contact the application's support team for more information.
> ----------

> So I think I'll skip Unicode handling to start off with! (I've already
> had plenty of fun and games with it in the past.)

At least on Windows, use IDLE for the BMP subset of unicode.  tk and 
hence tkinter and IDLE can handle any char in the BMP subset.  I believe 
that which are actually displayed and which are shown as boxes depends 
on the font.  On my US Win10 system:

IDLE with Lucida Console:
 >>> s = '€€€'
 >>> s
'€€€'

In the console interpreter: '???' is printed.


-- 
Terry Jan Reedy





More information about the Python-list mailing list