[Python-ideas] Proposal for default character representation

Steven D'Aprano steve at pearwood.info
Fri Oct 14 20:18:10 EDT 2016


On Fri, Oct 14, 2016 at 07:56:29AM -0400, Random832 wrote:
> On Fri, Oct 14, 2016, at 01:54, Steven D'Aprano wrote:
> > Good luck with that last one. Even if you could convince the Chinese and 
> > Japanese to swap to ASCII, I'd like to see you pry the emoji out of the 
> > young folk's phones.
> 
> This is actually probably the one part of this proposal that *is*
> feasible. While encoding emoji as a single character each makes sense
> for a culture that already uses thousands of characters; before they
> existed the English-speaking software industry already had several
> competing "standards" emerging for encoding them as sequences of ASCII
> characters.

It really isn't feasible to use emoticons instead of emoji, not if 
you're serious about it. To put it bluntly, emoticons are amateur hour. 
Emoji implemented as dedicated code points are what professionals use. 
Why do you think phone manufacturers are standardising on dedicated code 
points instead of using emoticons?

Anyone who has every posted (say) source code on IRC, Usenet, email or 
many web forums has probably seen unexpected smileys in the middle of 
their code (false positives). That's because some sequence of characters 
is being wrongly interpreted as an emoticon by the client software. 
The more emoticons you support, the greater the chance this will 
happen. A concrete example: bash code in Pidgin (IRC) will often show 
unwanted smileys.

The quality of applications can vary greatly: once the false emoticon is 
displayed as a graphic, you may not be able to copy the source code 
containing the graphic and paste it into a text editor unchanged.

There are false negatives as well as false positives: if your :-) 
happens to fall on the boundary of a line, and your software breaks the 
sequence with a soft line break, instead of seeing the smiley face you 
expected, you might see a line ending with :- and a new line starting 
with ).

It's hard to use punctuation or brackets around emoticons without 
risking them being misinterpreted as an invalid or different sequence. 

If you are serious about offering smileys, snowmen and piles of poo to 
your users, you are much better off supporting real emoji (dedicated 
Unicode characters) instead of emoticons. It is much easier to support ☺ 
than :-) and you don't need any special software apart from fonts that 
support the emoji you care about.



-- 
Steve


More information about the Python-ideas mailing list