iterparse and unicode
Fredrik Lundh
fredrik at pythonware.com
Thu Aug 21 01:48:57 EDT 2008
George Sakkis wrote:
> Thank you both for the suggestions. I made a few more experiments to
> understand how iterparse behaves with respect to three dimensions:
Spending time researching undefined behaviour is pretty pointless. ET
parsers expect byte streams, because that's what XML files are. If you
pass it anything else, an ET implementation may attempt to convert that
thing to a byte string, run the game "rogue", or do something else that
it finds appropriate.
> It's interesting that the element text attributes after a successful
> parse do not necessarily have the same type, i.e. all be str or all
> unicode. I ported some text extraction code from BeautifulSoup (which
> handles all text as unicode) and I was surprized to find out that in
> xml.etree the returned text's type is not fixed, even within the same
> file. Although it's not a bug, having a mixed collection of byte and
> unicode strings from the same source makes me somewhat uneasy.
If you don't care about memory and execution performance, there are
plenty of toolkits that guarantee that you always get Unicode strings.
</F>
More information about the Python-list
mailing list