[Python-Dev] "data".decode(encoding) ?!
Michael Hudson
mwh@python.net
13 May 2001 13:36:26 +0100
"M.-A. Lemburg" <mal@lemburg.com> writes:
> Fredrik Lundh wrote:
> > can you take that again? shouldn't michael's example be
> > equivalent to:
> >
> > unicode(u"\u00e3".encode("latin-1"), "latin-1")
> >
> > if not, I'd argue that your "decode" design is broken, instead
> > of just buggy...
>
> Well, it is sort of broken, I agree. The reason is that
> PyString_Encode() and PyString_Decode() guarantee the returned
> object to be a string object. To be able to reuse Unicode codecs
> I added code which converts Unicode back to a string in case the
> codec return an Unicode object (which the .decode() method does).
> This is what's failing.
It strikes me that if someone executes
aString.decode("latin-1")
they're going to expect a unicode string. AIUI, what's currently
happening is that the string is converted from a latin-1 8-bit string
to the 16-bit unicode string I expected and then there is an attempt
to convert it back to an 8-bit string using the default encoding. So
if I'd done a
sys.setdefaultencoding("latin-1")
in my sitecustomize.py, then aString.decode("latin-1") would just be
aString again? This doesn't seem optimal.
> Perhaps I should simply remove the restriction and have both APIs
> return the codec's return object as-is ?! (I would be in favour of
> this, but I'm not sure whether this is already in use by someone...)
Are all the codecs ditributed with Python 2.1 unicode-related? If
that's the case, PyString_Decode isn't terribly useful is it? It
seems unlikely that it received much use. Could be wrong of course.
OTOH, maybe I'm trying to wedge to much behaviour onto a a particular
operation. Do we want
open(file).read().decode("jpeg") -> some kind of PIL object
to be possible?
Cheers,
M.
--
GET *BONK*
BACK *BONK*
IN *BONK*
THERE *BONK* -- Naich using the troll hammer in cam.misc