[2.5.1] ShiftJIS to Unicode?

MRAB google at mrabarnett.plus.com
Wed Nov 26 20:00:28 EST 2008


Gilles Ganault wrote:
> Hello
> 
> 	I'm trying to read pages from Amazon JP, whose web pages are
> supposed to be encoded in ShiftJIS, and decode contents into Unicode
> to keep Python happy:
> 
> www.amazon.co.jp
> <meta http-equiv="content-type" content="text/html; charset=Shift_JIS"
> /> 
> 
> But this doesn't work:
> 
> ======
> m = try.search(the_page)

How can you have name "try"? It's a reserved word!

> if m:
> 	#UnicodeEncodeError: 'charmap' codec can't encode characters in
> position 49-55: character maps to <undefined>		
> 	title = m.group(1).decode('shift_jis').strip()
> ======
> 
> Has someone successfully accessed Shift-JIS-encoded Japanese contents
> with Python?
> 
No problem here:

 >>> import urllib
 >>> data = urllib.urlopen("http://www.amazon.co.jp/").read()
 >>> decoded_data = data.decode("shift-jis")
 >>>



More information about the Python-list mailing list