Looking for UNICODE to ASCII Conversioni Example Code

Roy Smith roy at panix.com
Sat Oct 19 12:49:35 EDT 2013


On Saturday, October 19, 2013 12:16:02 PM UTC-4, Steven D'Aprano wrote:

> Another reasonable use for accent-stripping is searches. If I'm searching 
> for music by the Blue Öyster Cult, it would be good to see results for 
> Blue Oyster Cult as well.

Tell me about it (I work at Songza; music search is what we do).  Accents are easy (Beyoncé, for example).  What about NIN (where one of the N's is supposed to be backwards, but I can't figure out how to type that)?  And Ke$ha.  And "The artist previously known as a glyph which doesn't even exist in Unicode 6.3"

> On the other hand, if you name your band ▼□■□■□■, you deserve to wallow 
> in obscurity :-)

Indeed.

So, yesterday, I tracked down an uncaught exception stack in our logs to a user whose username included the unicode character 'SMILING FACE WITH SUNGLASSES' (U+1F60E).  It turns out, that's perfectly fine as a user name, except that in one obscure error code path, we try to str() it during some error processing.  If you named your band something which included that character, would you expect it to match a search for the same name but with 'WHITE SMILING FACE' (U+263A) instead?





More information about the Python-list mailing list