Forcing output as unicode for web

Skip Montanaro skip at pobox.com
Fri Feb 28 21:16:41 EST 2003


    Hasan> (asian characters, pretty much) when the script hits a non-ascii
    Hasan> rss feed, it crashes, how do i solve this?

You need to know the encoding of the rss feed you've grabbed.  That's
probably in the <?xml ... ?> tag.  Once you've found that, you need to
convert (decode) the text read to Unicode:

    rssDocument = unicode(rssDocument, encoding)

then later, when you want to emit bits of it, encode the Unicode into utf-8,
e.g.:

    n.textOf(n.first(item, 'title'))

becomes

    n.textOf(n.first(item, 'title')).encode("utf-8")

(assuming n.textOf() returns a Unicode object).

Skip





More information about the Python-list mailing list