[Python-Dev] Removing the implicit str() call from printing API

Ka-Ping Yee ping@lfw.org
Sat, 10 Feb 2001 16:41:41 -0800 (PST)


On Sat, 10 Feb 2001, Andy Robinson wrote:
> > So far, noone has commented on this idea.
> >
> > I would like to go ahead and check in patch which passes through
> > Unicode objects to the file-object's .write() method while leaving
> > the standard str() call for all other objects in place.
> >
> I'm behind this in principle.  Here's an example of why:
>
> >>> tokyo_utf8 =3D "??"   # the kanji for Tokyo, trust me...
> >>> print tokyo_utf8   # this is 8-bit and prints fine
> =E6=9D=B1=E4=BA=AC
> >>> tokyo_uni =3D codecs.utf_8_decode(tokyo_utf8)[0]
> >>> print tokyo_uni    # try to print the kanji
> Traceback (innermost last):
>   File "<interactive input>", line 1, in ?
> UnicodeError: ASCII encoding error: ordinal not in range(128)

Something like the following looks reasonable to me; the added
complexity is that the file object now remembers an encoder/decoder
pair in its state (the API might give the appearance of remembering
just the codec name, but we want to avoid doing codecs.lookup() on
every write), and uses it whenever write() is passed a Unicode object.

    >>> file =3D open('outputfile', 'w', 'utf-8')
    >>> file.encoding
    'utf-8'
    >>> file.write(tokyo_uni)      # tokyo_utf8 gets written to file
    >>> file.close()

Open questions:

    - If an encoding is specified, should file.read() then
      always return Unicode objects?

    - If an encoding is specified, should file.write() only
      accept Unicode objects and not bytestrings?

    - Is the encoding attribute mutable?  (I would prefer not,
      but then how to apply an encoding to sys.stdout?)

Side question: i noticed that the Lib/encodings directory supports
quite a few code pages, including Greek, Russian, but there are no
ISO-2022 CJK or JIS codecs.  Is this just because no one felt like
writing one, or is there a reason not to include one?  It seems to
me it might be nice to include some codecs for the most common CJK
encodings -- that recent note on the popularity of Python in Korea
comes to mind.


-- ?!ng

Happiness comes more from loving than being loved; and often when our
affection seems wounded it is is only our vanity bleeding. To love, and
to be hurt often, and to love again--this is the brave and happy life.
    -- J. E. Buchrose=20