[2.5.1] ShiftJIS to Unicode?
skip at pobox.com
skip at pobox.com
Wed Nov 26 19:32:12 EST 2008
Gilles> ======
Gilles> m = try.search(the_page)
Gilles> if m:
Gilles> #UnicodeEncodeError: 'charmap' codec can't encode characters in
Gilles> position 49-55: character maps to <undefined>
Gilles> title = m.group(1).decode('shift_jis').strip()
Gilles> ======
Gilles> Has someone successfully accessed Shift-JIS-encoded Japanese
Gilles> contents with Python?
Have you verified that the characters in position 49-55 are actually
Shift-JIS characters? In my experience problems decoding a source string in
any given character set are because of errors in the source, not errors in
Python.
OTOH, the characters in position 49-55 look like plain old ASCII to me.
Does Shift-JIS have ASCII as a proper subset?
--
Skip Montanaro - skip at pobox.com - http://smontanaro.dyndns.org/
More information about the Python-list
mailing list