[Python-Dev] Generalised String Coercion

Sun Aug 7 15:35:49 CEST 2005

Guido van Rossum wrote:
> My first response to the PEP, however, is that instead of a new
> built-in function, I'd rather relax the requirement that str() return
> an 8-bit string -- after all, int() is allowed to return a long, so
> why couldn't str() be allowed to return a Unicode string?

The problem here is that strings and Unicode are used in different
ways, whereas integers and longs are very similar. Strings are used
for both arbitrary data and text data, Unicode can only be used
for text data.

The new text() built-in would help make a clear distinction
between "convert this object to a string of bytes" and
"please convert this to a text representation". We need to
start making the separation somewhere and I think this is
a good non-invasive start.

Furthermore, the text() built-in could be used to only
allow 8-bit strings with ASCII content to pass through
and require that all non-ASCII content be returned as
Unicode.

We wouldn't be able to enforce this in str().

I'm +1 on adding text().

I would also like to suggest a new formatting marker '%t'
to have the same semantics as text() - instead of changing
the semantics of %s as the Neil suggests in the PEP. Again,
the reason is to make the difference between text and
arbitrary data explicit and visible in the code.

> The main problem for a smooth Unicode transition remains I/O, in my
> opinion; I'd like to see a PEP describing a way to attach an encoding
> to text files, and a way to decide on a default encoding for stdin,
> stdout, stderr.

Hmm, not sure why you need PEPs for this:

Open an encoded file:
---------------------
Use codecs.open() instead of open() or file().

Set the external encoding for stdin, stdout, stderr:
----------------------------------------------------
(also an example for adding encoding support to an
existing file object):

def set_sys_std_encoding(encoding):
    # Load encoding support
    (encode, decode, streamreader, streamwriter) = codecs.lookup(encoding)
    # Wrap using stream writers and readers
    sys.stdin = streamreader(sys.stdin)
    sys.stdout = streamwriter(sys.stdout)
    sys.stderr = streamwriter(sys.stderr)
    # Add .encoding attribute for introspection
    sys.stdin.encoding = encoding
    sys.stdout.encoding = encoding
    sys.stderr.encoding = encoding

set_sys_std_encoding('rot-13')

Example session:
>>> print 'hello'
uryyb
>>> raw_input()
hello
h'hello'
>>> 1/0
Genpronpx (zbfg erprag pnyy ynfg):
  Svyr "<fgqva>", yvar 1, va ?
MrebQvivfvbaReebe: vagrtre qvivfvba be zbqhyb ol mreb

Note that the interactive session bypasses the sys.stdin
redirection, which is why you can still enter Python
commands in ASCII - not sure whether there's a reason
for this, or whether it's just a missing feature.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 07 2005)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::