unicode, bytes redux

Mon Sep 25 03:33:39 EDT 2006

willie <willie at jamots.com> wrote:

> Is it too ridiculous to suggest that it'd be nice
> if the unicode object were to remember the
> encoding of the string it was decoded from?
> So that it's feasible to calculate the number
> of bytes that make up the unicode code points.

So what sort of output do you expect from this:

>>> a = '\xc9'.decode('latin1')
>>> b = '\xc3\x89'.decode('utf8')
>>> print (a+b).bytes()
???

And if you say that's an unfair question because you expected all the byte 
strings to be using the same encoding then there's no point storing it on 
every unicode object; you might as well store it once globally.