[Python-Dev] Printing and __unicode__

Martin v. Loewis martin@v.loewis.de
13 Nov 2002 22:50:25 +0100


Guido van Rossum <guido@python.org> writes:

> > But it would break compatibility, atleast with
> > xml.dom.minidom.Node.write, which support StringIO currently, and will
> > collect Unicode strings in it.
> 
> Would it be acceptable if StringIO required you to be consistent,
> i.e. write only Unicode *or* only 8-bit strings, and never mix them?

>From a strict point of view, that would be acceptable, since the DOM
is specified to be a Unicode thing. Unfortunately, neither the current
implementation, nor the common use make such a strict view reasonable.

It *would* be reasonable to mandate that all byte strings written to a
Unicode StringIO are ASCII, regardless of what the system encoding is;
however, the difference of that to the status quo is minor.

To give an example, just consider

        if self.childNodes:
            writer.write(">%s"%(newl))
            for node in self.childNodes:
                node.writexml(writer,indent+addindent,addindent,newl)
            writer.write("%s</%s>%s" % (indent,self.tagName,newl))
        else:
            writer.write("/>%s"%(newl))

Here, writer is often a StringIO instance; indent, addindent, and newl
are byte strings (as are all the literals), self.tagName might be a
Unicode string, and the orientation of the StringIO might be wide as
well.

> That would be some kind of magical behavior; the encoding attribute
> should be set to reflect the mode after the first write, and should
> be None initially (or some other way to indicate the magic).

While I sympathise with that architecture, a migration strategy would
be needed.

Python 2.3 will eliminate some of the pressure, by allowing
applications to specify an encoding when they write back XML; if they
do specify an encoding, the resulting stream will be narrow. Of
course, it is then up to application to actually specify the output
encoding (which, admittedly, should have been mandated from day 1).

Regards,
Martin