Python 3 is killing Python

Glenn Linderman v+python at g.nevcal.com
Sat Aug 2 02:18:47 EDT 2014


On 7/16/2014 7:27 AM, Frank Millman wrote:
> I just tried an experiment in my own project. Ned Batchelder, in his
> Pragmatic Unicode presentation, http://nedbatchelder.com/text/unipain.html,
> suggests that you always have some unicode characters in your data, just to
> ensure that they are handled correctly. He has a tongue-in-cheek example
> which spells the word PYTHON using various exotic unicode characters. I used
> this to populate a field in my database, to see if it would display in my
> browser-based client.
>
> The hardest part was getting it in. There are 6 characters, but utf-8
> requires 16 bytes to store it -
>
>      b'\xe2\x84\x99\xc6\xb4\xe2\x98\x82\xe2\x84\x8c\xc3\xb8\xe1\xbc\xa4'.decode('utf-8')
>
> However, that was it. Without any changes to my program, it read it from the
> database and displayed it on the screen. IE8 could only display 2 out of the
> 6 characters correctly, and Chrome could display 5 out of 6, but that is a
> separate issue. Python3 handled it perfectly.

wrapping the above in a print(), on Windows, I get:

Traceback (most recent call last):
   File "D:\my\py\python-utf8.py", line 1, in <module>
print(b'\xe2\x84\x99\xc6\xb4\xe2\x98\x82\xe2\x84\x8c\xc3\xb8\xe1\xbc\xa4'.decode('utf-8'))
   File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 
0-5: character maps to <undefined>

So Python3 doesn't handle it perfectly on Windows.  And I saw someone 
blame the Windows console for that... but the Windows console can 
properly display all those characters if the proper APIs are used. The 
bug is 7 years old: http://bugs.python.org/issue1602 and hasn't been 
fixed, although the technology for fixing it is available, and various 
workarounds (with limitations) have been available for 5 years, and 
patches have been available for 3 years that work pretty good. However, 
just a few days ago, 26 July 2014, Drekin had an insight that may 
possibly lead to a patch that will work well enough to be integrated 
into some future version of Python... I hope he follows up on it. This 
is a serious limitation, and it is, and always has been, a bug in Python 
3 Unicode handling on Windows.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20140801/d6c008ef/attachment.html>


More information about the Python-list mailing list