iterparse and unicode

Wed Aug 27 11:00:47 EDT 2008

On Aug 27, 5:42 am, Fredrik Lundh <fred... at pythonware.com> wrote:

> George Sakkis wrote:
> >> if you meant to write "encode", you can indeed safely do
> >> [s.encode('utf8') for s in strings] as long as all strings are returned
> >> by an ET implementation.
>
> > I was replying to the general assertion that "in 2.x ASCII byte
> > strings and unicode strings are compatible", not specifically about
> > the strings returned by ET.
>
> that assertion was made in the context of ET.  having to unilaterially
> change the topic to "win" an argument is pretty lame.

I took Stefan's comment as a general statement, not in the context of
ET. Feeling the need to keep "defending" ET at this point is, to
borrow your words, pretty lame.

> and if you really meant to write "decode", you picked a rather stupid
> example to support your complaint about ET not returning Unicode -- your
> example does work fine for byte strings (whether they contain pure ASCII
> or not), but doesn't work at all for arbitrary Unicode strings, because
> decoding things that are already decoded makes very little sense (which
> explains why that method was removed in 3.0).

The thing is, a user might be happily using ET and call "decode" on
the returned byte strings as long as the files happen to be all ASCII,
without knowing that ET may also return Unicode. Then after some weeks/
months/years a non-ASCII file comes in and the program breaks. As far
as I am concerned, it's a documentation issue, nothing more.

George