[Python-Dev] unifying str and unicode

Fredrik Lundh fredrik at pythonware.com
Tue Oct 4 10:33:15 CEST 2005


James Y Knight wrote:

> Your point would be much easier to stomach if the "str" type could
> *only* hold 7-bit ASCII.

why?  strings are not mutable, so it's not like an ASCII string will suddenly sprout
non-ASCII characters.  what ends up in a string is defined by the string source.  if
you cannot trust the source, your programs will never work.  after all, there's no-
thing in Python that keeps things like:

    s = file.readline().decode("iso-8859-1")
    s = elem.findtext("node")
    s = device.read_encoded_data()

from returning integers instead of strings, or returning socket objects on odd fridays.
but if the interface spec says that they always return strings that adher to python's
text model (=unicode or things that can be mixed with unicode), you can trust them
as much as you can trust anything else in Python.

(this is of course also why we talk about file-like objects in Python, and sequences,
and iterators and iterables, and stuff like that.  it's not type(obj) that's important, it's
what you can do with obj and how it behaves when you do it)

</F> 





More information about the Python-Dev mailing list