Unicode Support in Ruby, Perl, Python, Emacs Lisp

Sun Oct 10 06:27:10 EDT 2010

On Sat, 09 Oct 2010 15:45:42 -0700, Sean McAfee wrote:

>> I'll have to say, as far as text processing goes, the most beautiful
>> lang with respect to unicode is emacs lisp. In elisp code (e.g.
>> Generate a Web Links Report with Emacs Lisp ), i don't have to declare
>> none of the unicode or encoding stuff. I simply write code to process
>> string or buffer text, without even having to know what encoding it
>> is. Emacs the environment takes care of all that.
> 
> It's not quite perfect, though.  I recently discovered that if I enter a
> Chinese character using my Mac's Chinese input method, and then enter
> the same character using a Japanese input method, Emacs regards them as
> different characters, even though they have the same Unicode code point.
> For example, from describe-char:
> 
>   character: 一 (43323, #o124473, #xa93b, U+4E00)
>   character: 一 (55404, #o154154, #xd86c, U+4E00)
> 
> On saving and reverting a file containing such text, the characters are
> "normalized" to the Japanese version.

I don't know about GNU Emacs, but XEmacs doesn't use Unicode internally,
it uses byte-strings with associated encodings. Some of us like it that
way, as converting to Unicode may not be reversible, and it's often
important to preserve exact byte sequences.

FWIW, I'd expect Ruby to have worse support for Unicode, as its creator is
Japanese. Unicode is still far more popular in locales which historically
used ASCII or "almost ASCII" (e.g. ISO-646-*, ISO-8859-*) encodings than
in locales which had to use a radically different encoding.