unicode encoding usablilty problem

aurora aurora00 at gmail.com
Fri Feb 18 13:53:36 EST 2005


On Fri, 18 Feb 2005 19:24:10 +0100, Fredrik Lundh <fredrik at pythonware.com>  
wrote:

> that's how you should do things in Python too, of course.  a unicode  
> string
> uses unicode internally. decode on the way in, encode on the way out, and
> things just work.
>
> the fact that you can mess things up by mixing unicode strings with  
> binary
> strings doesn't mean that you have to mix unicode strings with binary  
> strings
> in your program.

I don't want to mix them. But how could I find them? How do I know this  
statement can be potential problem

   if a==b:

where a and b can be instantiated individually far away from this line of  
code that put them together?

In Java they are distinct data type and the compiler would catch all  
incorrect usage. In Python, the interpreter seems to 'help' us to promote  
binary string to unicode. Things works fine, unit tests pass, all until  
the first non-ASCII characters come in and then the program breaks.

Is there a scheme for Python developer to use so that they are safe from  
incorrect mixing?



More information about the Python-list mailing list