Looking for UNICODE to ASCII Conversioni Example Code

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sat Oct 19 12:16:02 EDT 2013


On Sat, 19 Oct 2013 11:14:30 -0300, Zero Piraeus wrote:

> :
> 
> On Sat, Oct 19, 2013 at 09:19:12AM +0000, Steven D'Aprano wrote:
>> Make no mistake, this sort of simple-minded stripping of accents and
>> diacritics is an extremely ham-fisted thing to do.
[...]
> Joking aside, there is a legitimate use for asciifying text in this way:
> creating unambiguous identifiers.
> 
> For example, a miscreant may create the username 'míguel' in order to
> pose as another user 'miguel', relying on other users inattentiveness.
> Asciifying is one way of reducing the risk of that.

I'm pretty sure that Oliver and 0liver may not agree. Neither will 
Megal33tHaxor and Mega133tHaxor.

It's true that there are *more* opportunities for this sort of 
shenanigans with Unicode, so I guess your comment about "reducing" the 
risk (rather than eliminating it) is strictly correct. But there are 
other (better?) ways to do so, e.g. you could generate an identicon for 
the user to act as a visual checksum:

http://en.wikipedia.org/wiki/Identicon 


Another reasonable use for accent-stripping is searches. If I'm searching 
for music by the Blue Öyster Cult, it would be good to see results for 
Blue Oyster Cult as well. And vice versa. (A good search engine should 
consider *adding* accents as well as removing them.)

On the other hand, if you name your band ▼□■□■□■, you deserve to wallow 
in obscurity :-)


-- 
Steven



More information about the Python-list mailing list