Get document as normal text and not as binary data

Kent Johnson kent37 at tds.net
Mon Mar 28 14:37:11 EST 2005


Markus Franz wrote:
> Hi.
> 
> I used urllib2 to load a html-document through http. But my problem
> is:
> The loaded contents are returned as binary data, that means that every
> character is displayed like lÀÃt, for example. How can I get the
> contents as normal text?

My guess is the html is utf-8 encoded - your sample looks like utf-8-interpreted-as-latin-1. Try
contents = f.read().decode('utf-8')

Kent

> 
> My script was:
> 
> import urllib2
> req = urllib2.Request(url)
> f = urllib2.urlopen(req)
> contents = f.read()
> print contents
> f.close()
> 
> Thanks!
> 
> Markus



More information about the Python-list mailing list