[Python-Dev] "data".decode(encoding) ?!

13 May 2001 13:36:26 +0100

"M.-A. Lemburg" <mal@lemburg.com> writes:

> Fredrik Lundh wrote:
> > can you take that again?  shouldn't michael's example be
> > equivalent to:
> > 
> >     unicode(u"\u00e3".encode("latin-1"), "latin-1")
> > 
> > if not, I'd argue that your "decode" design is broken, instead
> > of just buggy...
> 
> Well, it is sort of broken, I agree. The reason is that 
> PyString_Encode() and PyString_Decode() guarantee the returned
> object to be a string object. To be able to reuse Unicode codecs
> I added code which converts Unicode back to a string in case the
> codec return an Unicode object (which the .decode() method does).
> This is what's failing.

It strikes me that if someone executes

aString.decode("latin-1")

they're going to expect a unicode string.  AIUI, what's currently
happening is that the string is converted from a latin-1 8-bit string
to the 16-bit unicode string I expected and then there is an attempt
to convert it back to an 8-bit string using the default encoding.  So
if I'd done a 

sys.setdefaultencoding("latin-1")

in my sitecustomize.py, then aString.decode("latin-1") would just be
aString again?  This doesn't seem optimal.

> Perhaps I should simply remove the restriction and have both APIs
> return the codec's return object as-is ?! (I would be in favour of
> this, but I'm not sure whether this is already in use by someone...)

Are all the codecs ditributed with Python 2.1 unicode-related?  If
that's the case, PyString_Decode isn't terribly useful is it?  It
seems unlikely that it received much use.  Could be wrong of course.

OTOH, maybe I'm trying to wedge to much behaviour onto a a particular
operation.  Do we want

open(file).read().decode("jpeg") -> some kind of PIL object

to be possible?

Cheers,
M.

-- 
  GET   *BONK*
  BACK  *BONK*
  IN    *BONK*
  THERE *BONK*             -- Naich using the troll hammer in cam.misc