iterparse and unicode

Fredrik Lundh fredrik at pythonware.com
Wed Aug 27 05:42:29 EDT 2008


George Sakkis wrote:

>> if you meant to write "encode", you can indeed safely do
>> [s.encode('utf8') for s in strings] as long as all strings are returned
>> by an ET implementation.
> 
> I was replying to the general assertion that "in 2.x ASCII byte
> strings and unicode strings are compatible", not specifically about
> the strings returned by ET.

that assertion was made in the context of ET.  having to unilaterially 
change the topic to "win" an argument is pretty lame.

and if you really meant to write "decode", you picked a rather stupid 
example to support your complaint about ET not returning Unicode -- your 
example does work fine for byte strings (whether they contain pure ASCII 
or not), but doesn't work at all for arbitrary Unicode strings, because 
decoding things that are already decoded makes very little sense (which 
explains why that method was removed in 3.0).

     >>> "hello".decode("utf-8")
     Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
     AttributeError: 'str' object has no attribute 'decode'

are you sure you understand the distinction between Unicode strings and 
encoded strings?

</F>




More information about the Python-list mailing list