[Python-3000] Draft PEP for New IO system

Walter Dörwald walter at livinglogic.de
Tue Feb 27 22:47:26 CET 2007


Guido van Rossum wrote:

> On 2/27/07, Walter Dörwald <walter at livinglogic.de> wrote:
> [...]
>> The basic principle is that these codecs can encode strings and decode
>> bytes in multiple chunks. If you want to encode a unicode string u in
>> UTF-16 you can do it in one go:
>>     s = u.encode("utf-16")
>> or character by character:
>>     encoder = codecs.lookup("utf-16").incrementalencoder()
>>     s = "".join(encoder.encode(c) for c in u) + encoder.encode(u"", True)
>> The incremental encoder makes sure, that the result contains only one 
>> BOM.
>>
>> Decoding works in the same way:
>>     decoder = codecs.lookup("utf-16").incrementaldecoder()
>>     u = u"".join(decoder.decode(c) for c in s) + decoder.decode("", True)
> 
> Thanks for the explanations, it is a little bit clearer now!
> 
> [...]
>> >> Should it be possible to change the error handling during the lifetime
>> >> of a stream? Then this change would have to be passed through to the
>> >> underlying codec.
>> >
>> > Not unless you have a really good use case handy...
>>
>> Not for decoding, but for encoding: If you're outputting XML and use an
>> encoding that can't encode all unicode characters, then it makes sense
>> to switch to "xmlcharrefreplace" error handling during the output of
>> text nodes (and back to "strict" for element names etc.).
> 
> So do the incremental codecs allow this switching?

Yes:

 >>> import codecs
 >>> ci = codecs.lookup("ascii")
 >>> enc = ci.incrementalencoder(errors="xmlcharrefreplace")
 >>> enc.encode(u"\xff")
'&#255;'
 >>> enc.errors = "strict"
 >>> enc.encode(u"\xff")
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/usr/local/lib/python2.5/encodings/ascii.py", line 22, in encode
     return codecs.ascii_encode(input, self.errors)[0]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in 
position 0: ordinal not in range(128)

And it's documented that changing the errors attribute is allowed:
    http://docs.python.org/lib/incremental-encoder-objects.html
    http://docs.python.org/lib/incremental-decoder-objects.html

Servus,
    Walter



More information about the Python-3000 mailing list