[OT] does the charset lie?

Skip Montanaro skip at pobox.com
Sun May 2 14:52:59 EDT 2004


    >> data = unicode(data, "iso-8859-1").encode("utf-8")
    >> data = map_entities_to_utf_8(data)
    >> data = unicode(data, "utf-8")

    David> Or, even simpler, skip the intermediate step:

    David>      data = unicode(data, "iso-8859-1")
    David>      data = map_entities_to_unicode(data)

    David> map_entities_to_unicode() could use htmlentitydefs.name2codepoint
    David> from the stdlib.

Thanks, I always forget there's an htmlentitydefs module.

Skip




More information about the Python-list mailing list