Looking for UNICODE to ASCII Conversioni Example Code

Roy Smith roy at panix.com
Sat Oct 19 21:52:35 EDT 2013


In article <mailman.1267.1382220612.18130.python-list at python.org>,
 Chris Angelico <rosuav at gmail.com> wrote:

> On Sun, Oct 20, 2013 at 3:49 AM, Roy Smith <roy at panix.com> wrote:
> > So, yesterday, I tracked down an uncaught exception stack in our logs to a 
> > user whose username included the unicode character 'SMILING FACE WITH 
> > SUNGLASSES' (U+1F60E).  It turns out, that's perfectly fine as a user name, 
> > except that in one obscure error code path, we try to str() it during some 
> > error processing.
> 
> How is that a problem? Surely you have to deal with non-ASCII
> characters all the time - how is that particular one a problem? I'm
> looking at its UTF-8 and UTF-16 representations and not seeing
> anything strange, unless it's the \x0e in UTF-16 - but, again, you
> must surely have had to deal with
> non-ASCII-encoded-whichever-way-you-do-it.
> 
> Or are you saying that that particular error code path did NOT handle
> non-ASCII characters?

Exactly.  The fundamental error was caught, and then we raised another 
UnicodeEncodeError generating the text of the error message to log!

> If so, that's a strong argument for moving to
> Python 3, to get full Unicode support in _all_ branches.

Well, yeah.  The problem is, my pip requirements file lists 76 modules 
(and installing all those results in 144 modules, including the cascaded 
dependencies).  Until most of those are P3 ready, we can't move.

Heck, I can't even really move off 2.6 because we use Amazon's EMR 
service, which is stuck on 2.6.



More information about the Python-list mailing list