japanese encoding iso-2022-jp in python vs. perl

Tue Oct 23 06:37:30 EDT 2007

Hi,
  I am rather new to python, and am currently struggling with some
encoding issues.  I have some utf-8-encoded text which I need to
encode as iso-2022-jp before sending it out to the world. I am using
python's encode functions:
--
 var = var.encode("iso-2022-jp", "replace")
 print var
--

 I am using the 'replace' argument because there seem to be a couple
of utf-8 japanese characters which python can't correctly convert to
iso-2022-jp.  The output looks like this:
↓東京???日比谷線?北千住行

 However if use perl's encode module to re-encode the exact same bit
of text:
--
 $var = encode("iso-2022-jp", decode("utf8", $var))
 print $var
--

 I get proper output (no unsightly question-marks):
↓東京メトロ日比谷線・北千住行

So, what's the deal?  Why can't python properly encode some of these
characters?  I know there are a host of different iso-2022-jp
variants, could it be using a different one than I think (the
default)?  I'm quite liking python at the moment for a variety of
different reasons (I suspect perl will forever win when it comes to
regular expressions but everything else is pretty darn nice), but this
is a bit worrying.

-Joe