unicode encoding usablilty problem

Sun Feb 20 05:01:58 EST 2005

aurora wrote:
> Lots of errors. Amount them are gzip (binary?!) and strftime??

For gzip, this is not surprising. It contains things like

   self.fileobj.write('\037\213')

which is not intended to denote characters.

> How about
> 
>   b'' - 8bit string; '' unicode string
> 
> and no automatic conversion.

This has been proposed before, see PEP 332. The problem is that
people often want byte strings to be mutable as well, so it is
still unclear whether it is better to make the b prefix denote
the current string type (so it would be currently redundant)
or a newly-created mutable string type (similar to array.array).

> Perhaps this can be activated by something  
> like the encoding declarations, so that transition can happen module by  
> module.

That could work for the literals - a __future__ import would be
most appropriate. For "no automatic conversion", this is very
difficult to implement on a per-module basis. The errors typically
don't occur in the module itself, but in some function called by
the module (e.g. a builtin method of the string type). So the
callee would have to know whether the caller has a future
import...

Regards,
Martin