Unicode Support in Ruby, Perl, Python, Emacs Lisp

Xah Lee xahlee at gmail.com
Sat Oct 9 20:40:36 EDT 2010


2010-10-09

On Oct 9, 3:45 pm, Sean McAfee <eef... at gmail.com> wrote:
> Xah Lee <xah... at gmail.com> writes:
> > Perl's exceedingly lousy unicode support hack is well known. In fact
> > it is the primary reason i “switched” to python for my scripting needs
> > in 2005. (See: Unicode in Perl and Python)
>
> I think your assessment is antiquated.  I've been doing Unicode
> programming with Perl for about three years, and it's generally quite
> wonderfully transparent.

you are probably right. The last period i did serious perl is 1998 to
2004. Since, have pretty much lost contact with perl community.

i have like 5 years of 8 hours day experience with perl... the app we
wrote is probably the largest perl web app at the time, say within the
top 10 largest perl web apps, during the dot com days.

spend 2 years with python about 2005, 2006, but mostly just personal
dabbling.

my dilema is this... i am really tired of perl, so i thougth python is
my solution. Comparing the syntax, semantics, etc, i really do find
python better, but to know python as well as i know perl, or, to know
a lang really as a expert (e.g. intimately familiar with all the ins
and outs of constructs, idioms, their speeds, libraries out there,
their nature, which are used, their bugs etc), takes years. So,
whenever i have this psychological urge to totally ditch perl and hug
python 100% ... but it takes a huge amount of time to dig into a lang
well again, so sometimes i thought of sticking with my perl due to my
existing knowledge and forthwith stop wasting valuable time, but then,
whenever i work in perl with its hack nature and crooked community
(all those mongers fuck), especially the syntax for nested list/hash
that's more than 3 levels (and my code almost always rely on nested
list/hash to do things since am a functional programer), and compare
to python's syntax on nested structure, i ask my self again, is this
shit really what i want to keep on at?

and python 3 comes in, and over the years i learned, that Guido really
hates functional programing (he understands it nil), and python is
moving more innto oop mumbo jumbo with more special syntaxes and
special semantics. (and perl is trivially far more capable at
functional programing than python) So, this puts a damnation in my
mental struggle for python.

in the end i really haven't decided on anything, as usual... it's not
really concrete, answerable question anyway, it's just psy struggle on
some fuzzy ideal about efficiency and perfect lang.

and there's ruby... (among others) and because i'm such a douchbag for
langs, now and then i suppose i waste my time to venture and read
about ruby, the unconcious execuse is that maybe ruby will turn out to
simply solve all my life's problems, but nagging in the back of my
mind is the reality that, yeah, go spend 3 years 8 hours a day on
ruby, then possibly it'll be practically useful to me as i do with
perl already, and, no, it won't bring you anything extra as far as
lang goes, for that you go to OCaml/F#, erlang, Mathematica ... and
who knows what kinda hidden needle in the eye i'll discover on my road
in ruby.

btw, this is all just a geek's mental disorder, common with many who's
into lang design and beauty etc type of shit. (high percentage of this
crowd hang in newsgroups) But the reality is that, this psychological
problem really don't have much practical justification ... it's just
fret, fret, fret. Fret, fret, fret. Years of fretting, while others
have written great apps all over the web.

in practice, i do not even have a need for perl or python in my work
since about 2006, except a few find/replace scripts for text
processing that i've written in the past. And, since about 2007, i've
been increasingly writing lots and lots more in elisp. (and this emacs
beast, is really a true love more than anything) So these days, almost
all of my scripts are in elisp. (and my job these days is mainly just
text processing programing)

• 〈Xah on Programing Languages〉
http://xahlee.org/Periodic_dosage_dir/comp_lang.html

> On the programmers' web site stackoverflow.com, I flag questions with
> the "unicode" tag, and of questions that mention a specific language,
> Python and C++ seem to come up the most often.
>
> > I'll have to say, as far as text processing goes, the most beautiful
> > lang with respect to unicode is emacs lisp. In elisp code (e.g.
> > Generate a Web Links Report with Emacs Lisp ), i don't have to declare
> > none of the unicode or encoding stuff. I simply write code to process
> > string or buffer text, without even having to know what encoding it
> > is. Emacs the environment takes care of all that.
>
> It's not quite perfect, though.  I recently discovered that if I enter a
> Chinese character using my Mac's Chinese input method, and then enter
> the same character using a Japanese input method, Emacs regards them as
> different characters, even though they have the same Unicode code point.
> For example, from describe-char:
>
>   character: 一 (43323, #o124473, #xa93b, U+4E00)
>   character: 一 (55404, #o154154, #xd86c, U+4E00)

that's because you are using pre emacs 23. Try to switch to emacs 23,
it uses utf-8 to represent chars internally.

> On saving and reverting a file containing such text, the characters are
> "normalized" to the Japanese version.
>
> I suppose this might conceivably be the correct behavior, but it sure
> was a surprise that (equal "一" "一") can be nil.

(equal "一" "一")

with emacs 23.*, this eval to true.

• 〈New Features in Emacs 23〉
http://xahlee.org/emacs/emacs23_features.html

• 〈Emacs and Unicode Tips〉
http://xahlee.org/emacs/emacs_n_unicode.html

• 〈All about Unicode〉
http://xahlee.org/Periodic_dosage_dir/unicode.html

 Xah ∑ xahlee.org ☄



More information about the Python-list mailing list