Finding a \u0096

Jeff Epler jepler at unpythonic.net
Wed Dec 4 09:54:53 EST 2002


You'd be best off converting your input to Unicode strings, using
    text = text.decode("cp1252")
doing all your conversions in terms of unicode characters
    text = text.replace(u"\u2013", "–")
    ...
and finally converting to UTF-8 on output:
    text = text.encode('utf-8')
    u'\u0096'.encode('utf-8')

Jeff




More information about the Python-list mailing list